Key Takeaways
- Agent memory is an infrastructure pattern, not a model feature: Store the past outside the model and inject the relevant slice into each prompt.
- Every memory system reduces to one loop: Store a row, embed it, search for the nearest vectors, and inject the matches.
- TiDB holds text, vectors, and metadata in a single table, so filtering and vector (or hybrid BM25) search run in one query with no separate vector store to sync.
- You can build the whole loop on the free tier of TiDB Cloud Starter, with the full SQL and pytidb code in the pingcap/agent-rules repo.
I gave a session at Microsoft Build 2026 on agent memory with TiDB. A few people asked for the code afterward, so here’s a complete write up of the session: The same pattern as the talk, with copy-paste-ready schema and queries.
You can watch the original Microsoft Build 2026 session 여기.
Why an Agent Memory Database is an Infrastructure Problem
Large language models are stateless. Every API call starts from scratch. Whatever a user told the agent yesterday, their preferences, their last support ticket, the back-and-forth that finally landed on the right answer, all of it is gone the moment the response finishes streaming.
Memory is how you close that gap. It is not a model feature, it is an infrastructure pattern. You store the past somewhere outside the model, and on each new turn you pull back the relevant slice and inject it into the prompt. The agent looks like it remembers because you remembered for it.
The real engineering question is where you put that memory, and how you find the right piece of it fast.
The Four-Step Memory Loop
Strip away the framework noise and every agent memory system reduces to the same loop:
- Store each thing worth remembering as a row in a table.
- Embed the row’s text into a vector that captures its meaning.
- Search for the rows whose vectors are closest to the current query. Those are your relevant memories.
- Inject those memories into the next LLM prompt.
Summarization, fact extraction, and decay scoring all sit on top of this loop. Get the loop right first.
Your Agent Memory Database in One Table
A memory row needs to do three things at once: Hold the text, hold a vector representation of that text, and hold the metadata that tells you whose memory it is and when it was created. TiDB’s built-in vector search lets all three live in a single table:
CREATE TABLE memories (
id BIGINT PRIMARY KEY AUTO_RANDOM,
user_id VARCHAR(64) NOT NULL,
content TEXT NOT NULL,
embedding VECTOR(1024),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_user (user_id),
VECTOR INDEX idx_embedding ((VEC_COSINE_DISTANCE(embedding)))
);
A few details worth calling out:
VECTOR(1024)is a native column type. No extension, no separate vector store running alongside your database, no sync job between them.- The
VECTOR INDEX ... VEC_COSINE_DISTANCEline builds an HNSW vector index, which keeps nearest-neighbor lookups fast as the table grows. AUTO_RANDOMinstead ofAUTO_INCREMENTmatters more than people coming from single-node MySQL expect. Sequential integer keys create write hotspots on a distributed system because every insert lands on the same node. Random keys spread inserts across the cluster.
Storing a Memory
If you already have an embedding pipeline, pass the vector in:
INSERT INTO memories (user_id, content, embedding)
VALUES (
'user_42',
'User prefers window seats on flights longer than 4 hours.',
'[0.0123, -0.0456, ..., 0.0789]'
);
If you would rather not run your own embedding pipeline, TiDB Cloud can generate vectors for you on insert. Define the embedding column as a generated column that calls a hosted model:
ALTER TABLE memories
MODIFY embedding VECTOR(1024) GENERATED ALWAYS AS (
EMBED_TEXT('tidbcloud_free/amazon/titan-embed-text-v2', content)
) STORED;
After that, you insert text and the database produces and stores the vector:
INSERT INTO memories (user_id, content)
VALUES ('user_42', 'User prefers window seats on long flights.');
One less moving part to maintain. In my Build session I used OpenAI embeddings; here I switched to the Titan model because TiDB Cloud hosts it for free, so everything in this post runs with no API keys and no credit card. If you prefer OpenAI, Cohere, or Jina embeddings, swap in that model name and bring your own key.
Recalling Memories with Vector Search
When the agent receives a new message, embed the message and ask the database which stored memories mean something similar:
SELECT
id,
content,
VEC_COSINE_DISTANCE(
embedding,
EMBED_TEXT('tidbcloud_free/amazon/titan-embed-text-v2',
'Where should I book his seat?')
) AS distance
FROM memories
WHERE user_id = 'user_42'
ORDER BY distance
LIMIT 5;
The query about booking a seat returns “User prefers window seats” even though the two sentences share almost no words. That is the embedding doing its job: It matches on meaning, not on exact text.
Two things make this query nice in TiDB specifically. First, the WHERE user_id = 'user_42' filter runs in the same query as the vector search. There is no two-system dance, no joining results back together in application code. One round trip. Second, your source-of-truth memory write and your retrieval logic live in the same transactional database, which keeps the application model much simpler than syncing a separate vector system.
Hybrid Search for When Meaning is Not Enough
Pure vector search is strong at concepts and weak at proper nouns. A query like “the issue with order #A-9912” will often surface memories about other orders that feel conceptually close but are not the right record. That is where keyword search earns its place. TiDB has BM25 full-text search built in, so you can blend both signals in a single hybrid search query:
CREATE FULLTEXT INDEX idx_content ON memories(content);
SELECT
id,
content,
(0.7 * VEC_COSINE_DISTANCE(
embedding,
EMBED_TEXT('tidbcloud_free/amazon/titan-embed-text-v2', :query))
- 0.3 * fts_match_score('idx_content', :query)) AS hybrid_score
FROM memories
WHERE user_id = :user_id
ORDER BY hybrid_score
LIMIT 5;
The 0.7 and 0.3 are weights you tune for your workload. One gotcha worth flagging: Vector distance is lower-is-better, but BM25 relevance is higher-is-better, which is why the formula subtracts the full-text score instead of adding it. In production RAG, this hybrid setup almost always beats either approach on its own, and you pay for it with one extra index.
The Agent Memory Loop in Python with pytidb
Most AI developers I work with live in Python, not SQL. The pytidb SDK wraps the same primitives, and with auto-embedding turned on you never touch a vector directly. Insert text. Query with text. The library handles the rest.
from pytidb import TiDBClient
from pytidb.embeddings import EmbeddingFunction
from pytidb.schema import TableModel, Field
from pytidb.datatype import TEXT
db = TiDBClient.connect(
database_url="mysql+pymysql://USER:PASS@HOST:4000/test"
"?ssl_verify_cert=true&ssl_verify_identity=true"
)
# Auto-embedding: the embedding column is derived from `content`
# server-side.
embed = EmbeddingFunction(model_name="tidbcloud_free/amazon/titan-embed-text-v2")
class Memory(TableModel):
id: int = Field(primary_key=True)
user_id: str
content: str = Field(sa_type=TEXT)
embedding: list[float] = embed.VectorField(source_field="content")
memories = db.create_table(schema=Memory, if_exists="overwrite")
# Store. No vector math here; embeddings are generated automatically.
memories.bulk_insert([
Memory(user_id="user_42",
content="User prefers window seats on long flights."),
Memory(user_id="user_42",
content="User flies United and Delta only."),
])
# Recall. Pass plain text; pytidb embeds the query and ranks by
# cosine distance.
results = (
memories.search("Where should I book his seat?")
.filter({"user_id": "user_42"})
.limit(5)
.to_list()
)
Three lines to insert a memory, four to retrieve the relevant ones. That is the entire loop.
Why an Agent Memory Database, Not a Dedicated Vector Store
You can build this on a dedicated vector database, with Postgres for user profiles, S3 for transcripts, and Redis for session state. Many teams start there. When memory lives in the same engine as the rest of your agent’s state, a few things change:
- Filtering, sorting, and vector search run in one query plan.
WHERE user_id = ... AND created_at > ... ORDER BYexecutes as a single statement. No application-layer joins between systems. - You get ACID transactions across the whole agent’s state. When the agent writes a new memory, deducts a credit, and logs an event, all three commit together or none of them do. The alternative is debugging partial writes after the fact.
- You have one copy of the data. Embeddings live next to the source text. When you change embedding models, you re-embed in place. No sync pipeline drifting out of date.
- You get multi-tenancy at real scale. Each user or agent session can have an isolated branch of the database, created in milliseconds with copy-on-write storage. Manus runs this pattern in production, creating close to 1 million database tenants within three months on its agent platform.
Agent Memory Database: Where to Go Next
Everything in this post runs on the free tier of TiDB Cloud 스타터. You can build the entire memory loop, schema, auto-embedding, vector search, and hybrid retrieval without entering a credit card. The same primitives scale up to production workloads: Pinterest runs 1.5 PB of data at a peak of 8 million QPS on TiDB, and Flipkart benchmarked TiDB as a hot store to 1 million QPS. For metadata-heavy consolidation, Atlassian collapsed 750 PostgreSQL clusters down to 16 on TiDB.
If you would rather not hand-roll the memory layer, the open-source mem9 project sits on top of these same TiDB primitives and provides a memory API with fact extraction, deduping, and decay built in. Same storage underneath, with the memory semantics handled for you.
The full code from this post, including the Python version using the pytidb SDK, is in the pingcap/agent-rules repository. Clone it, point it at a free TiDB Cloud Starter cluster, and you have a working agent memory loop in a few minutes.
Experience modern data infrastructure firsthand.
TiDB Cloud 전용
A fully-managed cloud DBaaS for predictable workloads
TiDB Cloud 스타터
A fully-managed cloud DBaaS for auto-scaling workloads