Agent Memory Product: How We Built mem9 on TiDB Cloud

Key Takeaways

mem9 started as a customer request in March 2026, not a roadmap. We shipped a prototype before we wrote a plan.

Agent memory is not a storage problem. It is an engineering problem at the intersection of ingestion, ranking, evaluation, and product judgment.

A memory API alone is not a product. People want to see, inspect, trust, and correct what an agent remembers.

mem9 runs on TiDB Cloud, the same substrate behind TiDB Cloud Zero.

In early March 2026, a customer asked us for something that sounded simple and turned out to be one of the hardest problems in the agent stack: Make agents remember.

We did not start with a polished roadmap, a heavyweight architecture review, or a six-month product plan. We started the way many products do: With a concrete user pain, a rough prototype, and a very short distance between “this is interesting” and “we need to ship this.”

That was the beginning of mem9.

Looking back now, mem9 feels less like a conventional software project and more like a compressed startup year. What began as a fast customer-driven prototype quickly became a product, then a platform, and then a much deeper exploration of what agent memory actually requires in production. The visible features changed quickly, but the core question stayed the same. How do you help an agent remember what matters, without overwhelming it with everything else?

This is what we have learned so far.

It Started With a Real Problem, Not a Market Thesis

The real beginning of mem9 was not a category map or a strategy deck. It was a customer asking a practical question. If an agent could keep durable memory across sessions, would the user actually feel the difference?

We believed the answer might be yes, but belief was not enough. We needed to make the value obvious, fast.

So we took the shortest path to proof. We built a rough but convincing version, put it in front of a customer, and watched for the reaction. That prototype did exactly what it needed to do. It made the value legible. Once people could see an agent remember something it would normally forget, the conversation changed immediately. We were no longer talking about an interesting capability. We were talking about a product the market was ready for.

That early moment shaped everything that followed. mem9 has always felt like an agent-era product to us because it was born from workflow pain rather than abstract positioning. It was validated almost immediately, and once it was validated, the pace changed. The project stopped behaving like an experiment and started behaving like a startup.

In the first few days, we assembled the core of the system surprisingly quickly: A Go server, memory APIs, TiDB Cloud for storage, search, auth, rate limiting, and the first plugin integrations. Almost immediately after that, support expanded across agent environments such as OpenClaw, OpenCode, and Claude Code, while onboarding improved, multi-tenant foundations landed, and the first mem9.ai site went live. We were not following a neat sequence from infra to product to growth. In reality, all of those tracks were moving at once, because once the value was obvious, hesitation became more expensive than momentum.

Memory Is Not a Storage Feature

Early on, one thing became clear: We were not trying to build “a vector database for agents.” We were trying to build memory that actually improves agent behavior.

That is a small change in framing with very large architectural consequences.

A lot of discussions about agent memory still frame the problem as storage plus retrieval. In practice, that framing is too shallow. The hard part is not whether information can be stored. The hard part is whether the right information comes back at the right time, in the right amount, under real production constraints.

Too little recall and the agent forgets the one detail that matters. Too much recall and the context gets polluted with irrelevant baggage. If recall becomes noisy as the memory corpus grows, trust disappears. So the challenge is not persistence by itself. The challenge is precision.

That insight pushed mem9 very quickly beyond a basic memory store. What started as durable memory soon became a more opinionated system for ingestion, extraction, reconciliation, ranking, and retrieval. We moved toward a server-centric architecture because we wanted integrations to stay thin while the memory logic could evolve centrally. That decision mattered. It let us improve behavior at the core instead of pushing complexity into every plugin or runtime.

This is the part of the category that we think is still underestimated. Memory quality is not mainly a UI problem, and it is not purely a model problem either. It is an engineering problem that sits at the intersection of storage, ranking, evaluation, latency, product judgment, and orchestration. If agents are going to do meaningful work in production, they do not just need more context. They need better context.

An API Is Not a Agent Memory Product

The next lesson came quickly. A memory API alone is not a product.

People do not just want memory to exist. They want to see it, inspect it, trust it, correct it, and eventually shape it. That is what pushed mem9 beyond infrastructure.

The next phase of mem9 was about turning an invisible backend capability into something users could actually experience. We built surfaces that made memory legible: Session views, timeline views, analysis workflows, filters, previews, and insight layers that helped people understand not only what had been remembered, but also why it mattered. That work gradually became “Your Memory,” not just as a UI, but as a way to make long-term memory feel concrete instead of abstract.

On the backend, that shift demanded a different kind of engineering. The work moved toward taxonomy, analysis quality, deduplication, responsiveness, and better report workflows. None of that had the drama of the first sprint, but it was just as important. The first phase proved that memory could work. This phase made it understandable and trustworthy.

At the same time, we were building all the less glamorous pieces that turn curiosity into adoption. The public website, docs, analytics, attribution, contact flows, better onboarding, and eventually API documentation. None of those changes are especially cinematic in a commit log, but they are how real products grow. Growth rarely comes from one dramatic launch. More often, it comes from dozens of small improvements that reduce friction, make the value easier to grasp, and help interested users become active users.

That combination of technical depth and product polish mattered. mem9 moved from a fast prototype to a product people could discover, evaluate, and use seriously. Within a little over two weeks, it had already crossed 10,000 users.

We Made Evaluation Part of the Agent Memory Product

Once users start relying on memory in real workflows, intuition is no longer enough.

“It feels better” is a good starting point, but it is not an operating system. We needed ways to measure whether recall quality was improving, regressing, or simply changing shape. That is what pulled us deeper into benchmarks.

Instead of treating benchmarks as side research, we treated them as product infrastructure. We built evaluation harnesses, adapted older multi-turn datasets into more modern agent settings, and created feedback loops that could guide actual engineering decisions. The point was not to chase performance benchmarks. The point was to make memory quality visible and debuggable.

That distinction mattered even more as mem9 entered more demanding conversations and partnerships, especially around Kimi. Once your system is being evaluated as a serious long-term memory layer for real agent workflows, vague claims stop being useful. You need baselines and evidence, to understand where retrieval works, where it fails, and how changes affect precision, recall, duplication, and evidence quality.

In that sense, benchmarking became less like academic scoring and more like instrumentation for product truth. It helped us move beyond taste and into iteration. It gave us a way to turn “memory feels off” into something diagnosable and improvable.

Agent Memory Has to Feel Human

One of the more interesting lessons in building mem9 was that memory should not remain purely invisible.

The APIs matter. The storage model matters. The ranking logic matters. But users do not experience memory as an index. They experience it as continuity. They care about whether the system feels like it knows them, whether it can reconnect threads over time, and whether that continuity feels trustworthy rather than uncanny.

That is part of why we kept investing in visualization and memory management instead of stopping at an API layer. It is also why some of the most distinctive ideas in mem9 came from product intuition rather than architecture diagrams.

A good example is Memory Farm, our visual memory explorer. On the surface, it looks playful: A pixel-art-inspired interface where memories grow as plants in a garden, clustered by topic and connected by relationship. The underlying instinct is serious. Memory becomes easier to understand when users can see patterns, clusters, history, and relationships in more intuitive forms. If memory is central to how an agent relates to a user, then memory products should not feel cold by default.

That lesson shaped more of the product than we expected. The goal was never just to retrieve facts. It was to help people build trust in a system that remembers on their behalf.

The Category Is Crowded Because the Problem Is Real

From the outside, agent memory can look like a hot category. From the inside, it mostly looks like a long list of hard edge cases.

Large context windows are still finite. Important facts get buried under recent noise. Naive retrieval brings back the wrong things. Repetition wastes tokens. Quality degrades as memory grows. And once recall starts to feel random, users lose confidence very quickly.

mem9 was built inside those problems from day one. That is why the product moved so quickly from raw persistence into ingestion, reconciliation, hybrid retrieval, ranking, analysis, benchmarking, and orchestration. The market attention is real, but it is downstream of a very real product need. Everyone building serious agents runs into the same failures sooner or later.

That is also why ecosystem shifts mattered so much to us. As agent frameworks introduced better lifecycle control around how context is assembled, memory stopped looking like a sidecar and started looking like a core part of the context pipeline. That is the point where the category becomes much more interesting. The best memory system is not the one that stores the most. It is the one that helps an agent decide what should stay, what should surface, and what should remain quiet.

Agent Memory Should Not Stop at Text

As we built mem9, we became increasingly convinced that long-term memory for agents should eventually become much warmer and richer than text-only retrieval.

This became especially vivid in conversations around multimodal use cases. Once you move beyond coding agents and into products built around voice, photos, and video, the meaning of memory changes. A useful memory system should not just retrieve a sentence from years ago. It should be able to retrieve the image, the audio fragment, the interaction, the evidence, and the surrounding context that make the present moment more meaningful.

That direction has shaped a lot of our thinking, especially alongside drive9, our new companion product for files and artifacts. If an agent can accurately bring back not only words but also sounds, images, and other forms of stored experience, memory stops feeling like note-taking and starts feeling much closer to continuity.

That is still an unfolding part of the journey, but it is one of the clearest directions we want to head in.

What Comes Next for mem9

mem9 is still in its early days.

We are improving retrieval quality under harder workloads, and we want stronger sharing models than all-or-nothing memory spaces. We also want better enterprise controls, auditing, and operational visibility alongside multimodal retrieval to move from an interesting direction into a real production capability. Finally, we want memory to become both more precise for the model and more understandable for the human.

The path so far has already taught us a lot.

First, prove the value with a real customer. Then ship fast enough that the market can respond while the need is still hot. From there, do the less visible work of making the system precise, measurable, trustworthy, and usable. Then keep returning to the same core problem from better and better angles.

That is how we built mem9. Not by starting with a perfect plan, but by moving from proof to product, from storage to retrieval, from intuition to evaluation, and from infrastructure to continuity. Underneath all of it was the same engineering question we started with: How do you help agents remember exactly what matters?

If you are building an agent that needs to remember across sessions, machines, or users, mem9 gives you a managed memory API for coding agents and custom tools. Persistent recall, hybrid retrieval, and shared memory spaces, with no backend to run yourself.

FAQ

What is mem9?

mem9 is a persistent memory layer for AI agents. It gives coding agents, custom tools, and multi-agent systems a shared memory that survives across sessions, machines, and users. mem9 supports hybrid retrieval (vector and keyword search in the same query), runs on TiDB Cloud underneath, and integrates with agent harnesses including OpenClaw, OpenCode, and Claude Code.

How is mem9 different from a vector database?

A vector database handles semantic similarity search. mem9 handles agent memory as a system: Ingestion, deduplication, ranking, retrieval, evaluation, and surfaces for inspection. Storage is one layer of that system, not the whole product. The hard part of agent memory is not whether something can be stored. It is whether the right information comes back at the right time, in the right amount.

What does mem9 run on?

mem9 runs on TiDB Cloud, a MySQL-compatible distributed SQL database with native vector search and ACID transactions. The same substrate is available on its own as TiDB Cloud Zero for teams who want to build the state layer themselves.

Why does agent memory need both vector and SQL?

Vector search is right for semantic recall, but agents also need exact filters and chronology, ownership and transactions, permissions and audit trails, and structured queries across users, sessions, and tools. A vector-only store cannot answer questions like “which session wrote this memory” or “what tool calls did this user make in the last hour.” Production agent memory needs both vector and SQL semantics in the same backend.

How does mem9 handle memory quality at scale?

mem9 treats evaluation as product infrastructure rather than side research. We built evaluation harnesses, adapted multi-turn datasets into modern agent settings, and instrumented the system so that precision, recall, deduplication, and evidence quality are measurable across releases. The goal is to turn “memory feels off” into something diagnosable.

When should I use mem9 vs. building my own memory layer on TiDB Cloud Zero?

Use mem9 when you want a managed memory API and want to skip the backend work. mem9 handles ingestion, ranking, retrieval, and surfaces for you. Use TiDB Cloud Zero when you want to design the schema, control access patterns, and own the state layer yourself. Both run on the same TiDB Cloud substrate, so neither commits you to a one-way migration.

Try mem9 Now