{"id":32407,"date":"2026-03-11T11:18:56","date_gmt":"2026-03-11T18:18:56","guid":{"rendered":"https:\/\/www.pingcap.com\/?p=32407"},"modified":"2026-03-13T12:23:15","modified_gmt":"2026-03-13T19:23:15","slug":"how-to-build-an-ai-memory-architecture-that-actually-remembers","status":"publish","type":"post","link":"https:\/\/www.pingcap.com\/ko\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/","title":{"rendered":"Building a Voice-First AI Journal: What I Learned About AI Memory, Vector Search, and TiDB"},"content":{"rendered":"<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Key_Takeaways\"><\/span>Key Takeaways<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Off-the-shelf memory frameworks can silently discard the details that matter most.<\/li>\n\n\n\n<li>A three-layer AI memory architecture delivers far better recall than any single abstraction.<\/li>\n\n\n\n<li>TiDB&#8217;s native vector search eliminates the two-database overhead of a Postgres + Pinecone setup.<\/li>\n\n\n\n<li>Model choice for synthesis tasks is a trust decision, not a cost decision.<\/li>\n<\/ul>\n<\/blockquote>\n\n\n\n<p>I was talking to Claude the other day \u2014 not about code or some technical problem. I was venting about work, about life. And Claude responded with something so personal, so specific to my situation, that I stopped and stared at it. It referenced my daughter by name. It brought up something I&#8217;d been stressed about from a conversation weeks earlier. It connected dots between completely separate chats.<\/p>\n\n\n\n<p>That feeling of being truly remembered by an AI? That&#8217;s a product.<\/p>\n\n\n\n<p>So I built <a href=\"https:\/\/speak2me.io\">Speak2Me<\/a>, a voice-first AI journal companion. You talk to it like a friend, and it actually remembers your story \u2014 not with generic responses like &#8220;that sounds frustrating,&#8221; but with real, personal context that references your life, your people, and your patterns.<\/p>\n\n\n\n<p>The first version took about two hours to build. Making it actually work took the rest of the week. Because here&#8217;s the thing nobody tells you about <a href=\"https:\/\/www.pingcap.com\/ko\/ai\/\">AI memory<\/a>: It&#8217;s really hard to get right.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"AI_Memory_Architecture_The_Promise_vs_The_Reality\"><\/span>AI Memory Architecture: The Promise vs. The Reality<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The concept was straightforward: Open the app and it just gets you. It remembers your partner&#8217;s name, asks about that job stress you mentioned last week, and checks whether the baby is sleeping through the night yet.<\/p>\n\n\n\n<p>I wired everything up \u2014 <a href=\"https:\/\/www.hume.ai\/\">Hume EVI<\/a> for voice, <a href=\"https:\/\/github.com\/mem0ai\/mem0\">Mem0<\/a> for long-term memory, <a href=\"https:\/\/www.pingcap.com\/ko\/tidb\/\">\ud2f0DB<\/a> for the database (relational data and <a href=\"https:\/\/www.pingcap.com\/ko\/blog\/integrating-vector-search-into-tidb-for-ai-applications\/\">vector search in one<\/a>), Claude as the reasoning layer, and <a href=\"https:\/\/vercel.com\">Vercel<\/a> for deployment. Sent the link to a few testers. Felt good about myself.<\/p>\n\n\n\n<p>Then I used it for real. Told it personal details \u2014 my income, my family, my goals for the year. Opened it the next session expecting a deeply personal experience.<\/p>\n\n\n\n<p>It had no idea who I was. Zero context. The entire product promise was broken.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"When_Your_AI_Memory_Architecture_Layer_Forgets\"><\/span>When Your AI Memory Architecture Layer Forgets<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>I was using Mem0 for long-term memory. If you haven&#8217;t encountered it, Mem0 is an open-source memory framework with over 40,000 GitHub stars. The idea is compelling: Feed it conversations, it extracts important facts, and you recall those facts later.<\/p>\n\n\n\n<p>During a test conversation, I provided exact financial details \u2014 my base salary and bonus, down to the dollar. I then checked what Mem0 actually stored.<\/p>\n\n\n\n<p>It had extracted a vague sentence about &#8220;wanting to discuss income.&#8221; The actual numbers were gone.<\/p>\n\n\n\n<p>This isn&#8217;t a bug in Mem0&#8217;s design \u2014 it&#8217;s a limitation of how memory extraction works. Mem0 uses a smaller language model internally (GPT-4o-mini) to decide what&#8217;s worth remembering, and smaller models are aggressive summarizers. They capture the gist and discard the specifics. For casual chatbot memory, that tradeoff might be acceptable. For a product where remembering exact life details is the value proposition, it&#8217;s a dealbreaker.<\/p>\n\n\n\n<p>I ran more tests with family details, career plans, specific names and dates. Some things it captured. Others it mangled or skipped entirely. There was no way to predict what it would retain, because I didn&#8217;t control the extraction model.<\/p>\n\n\n\n<p><strong>If the memory layer is the product, you can&#8217;t outsource it to someone else&#8217;s black box.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"The_Hallucination_Problem_Who_Is_Lily\"><\/span>The Hallucination Problem: Who Is Lily?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>While debugging the Mem0 issue, I made another mistake that could have been far worse.<\/p>\n\n\n\n<p>To save costs, I was using GPT-4o-mini to synthesize user profiles \u2014 taking all conversations and generating a summary document of who the user is, what they care about, and who&#8217;s important in their life. This profile gets injected into every future conversation as context.<\/p>\n\n\n\n<p>I ran the synthesis on my test data and read the output. It said my daughter&#8217;s name was &#8220;Lily&#8221; and my partner was &#8220;Sarah.&#8221;<\/p>\n\n\n\n<p>Neither name is correct. GPT-4o-mini fabricated plausible-sounding names when the real names simply hadn&#8217;t been mentioned yet. Instead of writing &#8220;not yet mentioned,&#8221; it invented details and presented them as fact.<\/p>\n\n\n\n<p>Imagine opening your personal journal companion and hearing it say &#8220;How&#8217;s Lily doing?&#8221; when your daughter&#8217;s actual name is completely different. That&#8217;s not a bug \u2014 it&#8217;s a trust-destroying moment you can never recover from.<\/p>\n\n\n\n<p>I switched immediately to Claude Haiku 3.5 for profile synthesis and added strict guardrails: Never invent, guess, or infer names, numbers, or details not explicitly stated in the conversations. If something hasn&#8217;t been mentioned, write &#8220;not yet mentioned.&#8221;<\/p>\n\n\n\n<p>Model choice for synthesis tasks isn&#8217;t a cost optimization. It&#8217;s a trust decision. One hallucinated family member name and your user is gone forever.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Building_a_Three-Layer_AI_Memory_Architecture\"><\/span>Building a Three-Layer AI Memory Architecture<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>After these failures, I rethought the entire memory architecture from scratch. The solution required three complementary layers.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Layer 1: The User Profile<\/h3>\n\n\n\n<p>After every conversation, Claude Haiku reads all past transcripts and generates a synthesized document \u2014 who the user is, their job, the important people in their life, their stressors, their goals. This document gets injected into the system prompt for every future session. It&#8217;s how the AI &#8220;knows&#8221; you before you say a word.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Layer 2: Per-Exchange Vector Search<\/h3>\n\n\n\n<p>This is where the biggest improvement happened.<\/p>\n\n\n\n<p>Originally, I was embedding entire conversation transcripts as single vectors. A 20-minute conversation covering salary, weekend plans, and a family wedding all became one vector \u2014 a single point in mathematical space representing the average of all those topics blended together.<\/p>\n\n\n\n<p>When I searched for &#8220;salary,&#8221; it would find that conversation, but it also pulled up every other long conversation with similarly diluted vectors. The signal was buried.<\/p>\n\n\n\n<p>The fix was chunking at the exchange level. One user message plus its AI response equals one chunk. Each chunk gets its own embedding vector. Now when I search for &#8220;salary,&#8221; it finds the <em>exact<\/em> exchange where salary was discussed \u2014 not the whole conversation, but the precise moment.<\/p>\n\n\n\n<p>It&#8217;s the difference between searching a book by title versus having every individual page indexed. The recall quality improvement was dramatic. (For even better retrieval, TiDB also supports <a href=\"https:\/\/www.pingcap.com\/ko\/blog\/introducing-full-text-search-for-tidb\/\">full-text search for hybrid retrieval<\/a> \u2014 combining keyword matching with vector similarity \u2014 which I&#8217;m planning to integrate next.)<\/p>\n\n\n\n<p>I&#8217;m using OpenAI&#8217;s <code>text-embedding-3-large<\/code> model (3,072 dimensions) and storing the vectors in <a href=\"https:\/\/www.pingcap.com\/ko\/tidb-cloud-serverless\/\">\ud2f0DB<\/a>, which supports <a href=\"https:\/\/www.pingcap.com\/ko\/blog\/tidb-vector-search-public-beta\/\">vector search natively<\/a>. When the AI needs to recall something during a live conversation, it searches these chunks using <a href=\"https:\/\/www.pingcap.com\/ko\/article\/understanding-the-cosine-similarity-formula\/\">cosine distance<\/a>. The cost is negligible \u2014 less than ten cents per user per year for embeddings.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Layer 3: Raw Transcripts<\/h3>\n\n\n\n<p>Every word, stored unmodified. This is the ground truth that never gets summarized, compressed, or distorted by a model. If the profile synthesis misses something or the vector search returns an unexpected result, the raw data is always there.<\/p>\n\n\n\n<p>After validating this three-layer approach, I removed Mem0 entirely. Not because it&#8217;s bad software \u2014 but once the architecture was working, it wasn&#8217;t adding value. It was just another dependency sitting between me and my data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_I_Chose_TiDB_Over_Postgres_Pinecone\"><\/span>Why I Chose TiDB Over Postgres + Pinecone<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The database choice deserves its own section because it addresses one of the most common architectural patterns in <a href=\"https:\/\/www.pingcap.com\/ko\/playbook-embed-vector-db-build-rag\/\">RAG applications<\/a> \u2014 and why I think that pattern is wrong for many use cases.<\/p>\n\n\n\n<p>Every RAG tutorial prescribes the same stack: Postgres for your relational data, <a href=\"https:\/\/www.pinecone.io\/\">Pinecone<\/a> for your vectors. Two databases. Two bills. Sync jobs between them.<\/p>\n\n\n\n<p>Here&#8217;s the actual query that runs when the AI needs to recall a memory in Speak2Me:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT\n  e.title,\n  e.top_emotions,\n  c.chunk_text,\n  VEC_COSINE_DISTANCE(c.embedding, ?) AS relevance\nFROM s2m_transcript_chunks c\nJOIN s2m_journal_entries e ON c.entry_id = e.id\nWHERE c.user_id = ?\n  AND e.created_at &gt; DATE_SUB(NOW(), INTERVAL 30 DAY)\nORDER BY relevance\nLIMIT 5<\/code><\/pre>\n\n\n\n<p>Vector search. Date filtering. User scoping. A JOIN to pull full context. <strong>One query. One network hop.<\/strong> (See the full list of <a href=\"https:\/\/docs.pingcap.com\/tidbcloud\/vector-search-functions-and-operators\/\">vector functions and operators<\/a> TiDB supports.)<\/p>\n\n\n\n<p>With a Postgres + Pinecone setup, that same operation becomes: Call Pinecone with the vector, get back chunk IDs, call Postgres with those IDs, and join the results in your application code. Two round trips, two failure points, and the join logic lives in JavaScript instead of the database optimizer.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Pre-Filtering Changes Everything<\/h3>\n\n\n\n<p>Vector search is computationally expensive \u2014 comparing a query vector against millions of stored vectors takes real compute. TiDB filters by <code>user_id<\/code> and date range <strong>first<\/strong> using standard indexes. Fast and cheap. Then it runs the vector comparison on that much smaller subset.<\/p>\n\n\n\n<p>Most <a href=\"https:\/\/www.pingcap.com\/ko\/compare\/best-vector-database\/\">dedicated vector databases<\/a> do the opposite: They search all vectors first, then filter out non-matching metadata after the fact. At scale, that difference is significant.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Strong Consistency for Real-Time AI<\/h3>\n\n\n\n<p>During a conversation, the AI extracts a fact from something you just said, stores it, and may need to recall it 30 seconds later in the same session. With a Postgres + Pinecone architecture, you&#8217;re managing sync lag \u2014 write to Postgres, trigger a job to update Pinecone, hope it finishes before the next recall. Eventual consistency headaches.<\/p>\n\n\n\n<p>With <a href=\"https:\/\/www.pingcap.com\/ko\/tidb-cloud-serverless\/\">\ud2f0DB<\/a>, I write the embedding and it&#8217;s immediately searchable. Same transaction. No lag, no sync jobs, no &#8220;read your own writes&#8221; issues.<\/p>\n\n\n\n<p>One database. Vectors next to the data they describe. Ship faster, debug easier. (For a deeper look, see our architecture guide: Why <a href=\"https:\/\/www.pingcap.com\/ko\/blog\/what-genai-means-for-your-application-data-architecture\/\">unified data architectures matter for GenAI<\/a>.)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"AI_Memory_Architecture_Solving_the_Latency_Problem\"><\/span>AI Memory Architecture: Solving the Latency Problem<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Even after fixing the memory architecture, there was a UX-breaking issue: Latency.<\/p>\n\n\n\n<p>When the AI needed to recall something, it would start responding immediately \u2014 confidently, specifically, and often <em>wrong<\/em>. Then, 10\u201320 seconds later when the vector search results arrived, it would correct itself mid-sentence.<\/p>\n\n\n\n<p>That moment destroys the product promise. You&#8217;re not talking to something that knows you \u2014 you&#8217;re watching a computer look you up.<\/p>\n\n\n\n<p>The solution was to move memory retrieval from query time to session start. Now when a conversation ends, Claude Haiku extracts key facts synchronously in about 500ms. Not just names and dates, but the kind of details a friend would remember: Specific restaurants, upcoming interviews, goals mentioned in passing.<\/p>\n\n\n\n<p>When you open the app next time, the dashboard prefetches your profile summary and the last 20 entries of quick facts in the background. By the time you speak, the AI has everything in context. No tool calls. No waiting.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><\/th><th>Session End<\/th><th>Session Start<\/th><th>Memory Recall<\/th><\/tr><\/thead><tbody><tr><td><strong>Before<\/strong><\/td><td>Instant<\/td><td>~2s<\/td><td>5\u201310s (tool call)<\/td><\/tr><tr><td><strong>After<\/strong><\/td><td>+500ms<\/td><td>Instant<\/td><td>Rarely needed<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The recall tool still exists for older memories \u2014 &#8220;What did I say three months ago about&#8230;&#8221; \u2014 but for anything recent, the AI just knows. It costs more tokens, but the first time the AI remembers something instantly, with no pause or correction, that&#8217;s the product.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"The_Voice_Echo_From_Hell\"><\/span>The Voice Echo From Hell<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Speak2Me is voice-first, powered by Hume EVI \u2014 which handles speech-to-text, emotion detection, LLM routing, and text-to-speech in a single WebSocket connection. When the AI speaks, Hume detects 48+ dimensions of vocal expression, so when you sound stressed, the AI adjusts its response accordingly.<\/p>\n\n\n\n<p>But here&#8217;s a problem nobody documents: When the AI speaks through your phone&#8217;s speaker, the microphone picks up that audio, the AI transcribes its own speech, and responds to itself. An infinite feedback loop.<\/p>\n\n\n\n<p>On a native iOS app, the OS provides hardware-level acoustic echo cancellation. On a web app running in a mobile browser, you&#8217;re at the mercy of whatever the browser implements \u2014 and mobile Safari is inconsistent at best.<\/p>\n\n\n\n<p>After trying microphone muting (which kills the ability to interrupt naturally), I settled on the browser&#8217;s built-in audio constraints:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>echoCancellation: true,\nnoiseSuppression: true,\nautoGainControl: true<\/code><\/pre>\n\n\n\n<p>On desktop, this works well. On mobile, it&#8217;s acceptable at lower volumes. The real solution is a native iOS app with system-level echo cancellation \u2014 that&#8217;s coming.<\/p>\n\n\n\n<p>If you&#8217;re building real-time voice AI on the web, budget significantly more time for audio engineering than you expect. This problem space is essentially uncharted.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Whats_Next\"><\/span>What&#8217;s Next<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p><a href=\"https:\/\/speak2me.io\">Speak2Me<\/a> is live. The immediate priority is encryption \u2014 users are sharing their most personal thoughts, and journal transcripts need to be encrypted at rest. After that, native iOS to solve the echo problem at the hardware level and add push notifications, background audio, and biometric authentication.<\/p>\n\n\n\n<p>The memory system will keep improving, but only with real conversation data flowing through it. If you&#8217;re a developer building anything with AI memory, I hope the architectural failures I documented here save you some time. If you want to go deeper on choosing the right <a href=\"https:\/\/www.pingcap.com\/ko\/article\/the-next-leap-in-data-management-unifying-ai-workloads-with-vector-databases\/\">data infrastructure for AI applications<\/a>, or see how I applied similar patterns in a <a href=\"https:\/\/www.pingcap.com\/ko\/blog\/privacy-first-ai-building-voice-to-text-app-tidb-claude\/\">privacy-first voice-to-text app<\/a> and an <a href=\"https:\/\/www.pingcap.com\/ko\/blog\/build-ai-powered-life-simulator-embeddings-branching-tidb\/\">AI-powered life simulator<\/a>, those deep dives are worth a read.<\/p>\n\n\n\n<p>And if you want to try Speak2Me, go talk to it. Tell it something real. Come back tomorrow and see if it remembers.<\/p>\n\n\n\n<p><em><a href=\"https:\/\/tidbcloud.com\/free-trial\/\">Start building with TiDB Cloud Starter<\/a> \u2014 vector search, SQL joins, and strong consistency in one MySQL-compatible database.<\/em><\/p>","protected":false},"excerpt":{"rendered":"<p>I was talking to Claude the other day \u2014 not about code or some technical problem. I was venting about work, about life. And Claude responded with something so personal, so specific to my situation, that I stopped and stared at it. It referenced my daughter by name. It brought up something I&#8217;d been stressed [&hellip;]<\/p>\n","protected":false},"author":324,"featured_media":32456,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ub_ctt_via":"","footnotes":""},"categories":[436],"tags":[476,147,298,111,297],"class_list":["post-32407","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorial","tag-ai-memory","tag-distributed-sql","tag-rag","tag-tidb","tag-vector-search"],"acf":[],"featured_image_src":"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png","author_info":{"display_name":"Chris Dabatos","author_link":"https:\/\/www.pingcap.com\/ko\/blog\/author\/chris-dabatos\/"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>AI Memory Architecture: How to Build One That Actually Works<\/title>\n<meta name=\"description\" content=\"Off-the-shelf AI memory tools lose critical details. Learn the three-layer AI memory architecture and TiDB setup that solved the problem.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AI Memory Architecture: How to Build One That Actually Works\" \/>\n<meta property=\"og:description\" content=\"Off-the-shelf AI memory tools lose critical details. Learn the three-layer AI memory architecture and TiDB setup that solved the problem.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-11T18:18:56+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-13T19:23:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2026\/03\/13121133\/tidb_1200x627.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2400\" \/>\n\t<meta property=\"og:image:height\" content=\"1254\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Chris Dabatos\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/static.pingcap.com\/files\/2026\/03\/13121147\/tidb_twitter_1600x900.png\" \/>\n<meta name=\"twitter:creator\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Chris Dabatos\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/\"},\"author\":{\"name\":\"Chris Dabatos\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/4d7ecdb90868256414855723f838c9e0\"},\"headline\":\"Building a Voice-First AI Journal: What I Learned About AI Memory, Vector Search, and TiDB\",\"datePublished\":\"2026-03-11T18:18:56+00:00\",\"dateModified\":\"2026-03-13T19:23:15+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/\"},\"wordCount\":2121,\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png\",\"keywords\":[\"AI Memory\",\"Distributed SQL\",\"RAG\",\"TiDB\",\"Vector Search\"],\"articleSection\":[\"Tutorial\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/\",\"url\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/\",\"name\":\"AI Memory Architecture: How to Build One That Actually Works\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png\",\"datePublished\":\"2026-03-11T18:18:56+00:00\",\"dateModified\":\"2026-03-13T19:23:15+00:00\",\"description\":\"Off-the-shelf AI memory tools lose critical details. Learn the three-layer AI memory architecture and TiDB setup that solved the problem.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#primaryimage\",\"url\":\"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png\",\"width\":3600,\"height\":1200},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building a Voice-First AI Journal: What I Learned About AI Memory, Vector Search, and TiDB\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/4d7ecdb90868256414855723f838c9e0\",\"name\":\"Chris Dabatos\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"caption\":\"Chris Dabatos\"},\"description\":\"Developer Advocate\",\"url\":\"https:\/\/www.pingcap.com\/ko\/blog\/author\/chris-dabatos\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"AI Memory Architecture: How to Build One That Actually Works","description":"Off-the-shelf AI memory tools lose critical details. Learn the three-layer AI memory architecture and TiDB setup that solved the problem.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pingcap.com\/ko\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/","og_locale":"ko_KR","og_type":"article","og_title":"AI Memory Architecture: How to Build One That Actually Works","og_description":"Off-the-shelf AI memory tools lose critical details. Learn the three-layer AI memory architecture and TiDB setup that solved the problem.","og_url":"https:\/\/www.pingcap.com\/ko\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","article_published_time":"2026-03-11T18:18:56+00:00","article_modified_time":"2026-03-13T19:23:15+00:00","og_image":[{"width":2400,"height":1254,"url":"https:\/\/static.pingcap.com\/files\/2026\/03\/13121133\/tidb_1200x627.png","type":"image\/png"}],"author":"Chris Dabatos","twitter_card":"summary_large_image","twitter_image":"https:\/\/static.pingcap.com\/files\/2026\/03\/13121147\/tidb_twitter_1600x900.png","twitter_creator":"@PingCAP","twitter_site":"@PingCAP","twitter_misc":{"Written by":"Chris Dabatos","Est. reading time":"11\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#article","isPartOf":{"@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/"},"author":{"name":"Chris Dabatos","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/4d7ecdb90868256414855723f838c9e0"},"headline":"Building a Voice-First AI Journal: What I Learned About AI Memory, Vector Search, and TiDB","datePublished":"2026-03-11T18:18:56+00:00","dateModified":"2026-03-13T19:23:15+00:00","mainEntityOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/"},"wordCount":2121,"publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png","keywords":["AI Memory","Distributed SQL","RAG","TiDB","Vector Search"],"articleSection":["Tutorial"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/","url":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/","name":"AI Memory Architecture: How to Build One That Actually Works","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#primaryimage"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png","datePublished":"2026-03-11T18:18:56+00:00","dateModified":"2026-03-13T19:23:15+00:00","description":"Off-the-shelf AI memory tools lose critical details. Learn the three-layer AI memory architecture and TiDB setup that solved the problem.","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#primaryimage","url":"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png","width":3600,"height":1200},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"Building a Voice-First AI Journal: What I Learned About AI Memory, Vector Search, and TiDB"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]},{"@type":"Person","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/4d7ecdb90868256414855723f838c9e0","name":"Chris Dabatos","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/","url":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","contentUrl":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","caption":"Chris Dabatos"},"description":"Developer Advocate","url":"https:\/\/www.pingcap.com\/ko\/blog\/author\/chris-dabatos\/"}]}},"grav_blocks":false,"card_markup":"<a class=\"card-resource bg-white\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/how-to-build-an-ai-memory-architecture-that-actually-remembers\/\"><div class=\"card-resource__image-container\"><img class=\"card-resource__image\" alt=\"tidb_feature_1800x600\" src=\"https:\/\/static.pingcap.com\/files\/2026\/03\/13121118\/tidb_feature_1800x600.png\" loading=\"lazy\" width=3600 height=1200 \/><\/div><div class=\"card-resource__content-container\"><div class=\"card-resource__content-head\"><div class=\"card-resource__category\">Tutorial<\/div><\/div><h5 class=\"card-resource__title\">Building a Voice-First AI Journal: What I Learned About AI Memory, Vector Search, and TiDB<\/h5><\/div><\/a>","_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/32407","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/users\/324"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/comments?post=32407"}],"version-history":[{"count":24,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/32407\/revisions"}],"predecessor-version":[{"id":32468,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/32407\/revisions\/32468"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media\/32456"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=32407"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/categories?post=32407"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tags?post=32407"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}