Introduction

You finally received the green light to ship RAG. But then reality shows up:

  • Someone asks, “Where do embeddings live?”
  • Another person asks, “How do we keep them in sync?”
  • Your on-call asks, “So… we’re running two databases now?”

This playbook shows you how to navigate two distinct courses of action:

  1. The classic “two-DB tango”: transactional DB + standalone vector database.
  2. The TiDB route: an embedded vector database where vector similarity search and SQL live together in one system.

You’ll discover a practical path to RAG retrieval that reduces moving parts, maintains data consistency, and supports real hybrid search patterns.

Who is This Playbook for: Storage Leads, SREs, & AI Startup Developers

This playbook is for the engineers who own (or inherit) the reliability, scale, and cost of a RAG data stack. If you manage or build these systems, you know that while new capabilities ship fast, operational drag arrives later when simple architecture diagrams turn into incident reports.

As a storage lead or SRE, your KPIs involve:

  • Latency Stability: Keeping search and retrieval stable even under heavy load.
  • On-Call Surface Area: Reducing the number of managed systems to minimize pages.
  • Data Integrity: Avoiding “silent failure modes” like drift between your source-of-truth SQL DB and a standalone vector database like Pinecone or Milvus.
  • Multi-tenant Control: Enforcing sane behavior so one AI experiment doesn’t cause a global outage window.

And if you’re a developer at an AI startup, you’re optimizing for:

  • Speed to production: Shipping RAG without standing up and integrating yet another datastore.
  • Simplicity: Fewer services, fewer pipelines, fewer “where does truth live?” debates.
  • Cost discipline: Preventing vector infrastructure and sync pipelines from becoming your biggest surprise bill.
  • Flexibility: Iterating on schemas, chunking strategies, and models without rebuilding your entire platform.

TiDB 8.5 addresses these challenges by embedding vector search directly into the SQL engine. This allows you to store chunks, metadata, and embeddings together, run ANN retrieval with SQL joins, and enforce multi-tenant guardrails to keep performance stable while controlling costs.

Pain: RAG Needs, Extra Vector DB Deeds

The fundamental pain of modern RAG is the “extra luggage” of a separate data plane.

Two Databases, Double Trouble

A standalone vector database such as Pinecone, Qdrant, or Milvus can absolutely work. But the minute you introduce a second data plane, you add:

  • Another cluster to deploy, scale, patch, and secure.
  • Another query path to monitor (and explain) when latency spikes.
  • Another data pipeline to keep consistent.

And in practice, your “RAG architecture” becomes a choreography problem: ingestion, chunking, embedding generation, vector upserts, metadata joins, and cache invalidation across systems.

Sync Drift & Latency Spikes

Two failure modes show up fast in production:

  1. Sync drift: Your source-of-truth data changes, but the embeddings in the vector store lag or fail to update.
  2. Hybrid query tax: Your application must perform a “vector search over here” and then a “SQL filter/join over there”. This makes every request a multi-hop distributed transaction that increases latency and lacks the guardrails of a single query engine.

The worst part? These issues often look “fine” in happy-path testing, but then appear in production under bursty traffic, partial failures, or backfills.

DIY Route: Standalone Vector Store Ops (The Two-DB Tango)

The “two-DB tango” is the classic approach of maintaining a transactional database alongside a standalone vector database.

Deploy, ETL, Monitor, Repeat

This route requires constant operational toil:

  • Provisioning: You must build and run a separate cluster, choosing index types and setting sharding.
  • Data Pipeline: You have to ETL embeddings from your OLTP database to the vector store on a schedule, often using change-data-capture (CDC) pipelines with complex retry logic.
  • Maintenance: You must patch, upgrade, and run separate backup/restore drills for a second system

It works. It just has a cost: more moving parts.

Hybrid Queries in App Code

In a two-system design, “hybrid search” often turns into application glue:

  1. Query vector DB → get topK IDs.
  2. Query SQL DB → join IDs to metadata, permissions, tenant rules, freshness constraints.
  3. Re-rank in the application.
  4. Return results.

That is doable. It is also a reliability bet: if either side is slow or stale, the whole request degrades.

TiDB Route: Embedded Vector DB Power

TiDB’s approach is simple: put vectors next to your rows and let SQL do the heavy lifting.

TiDB supports vector data types (VECTOR / VECTOR(D)) designed for storing embeddings, with dimension enforcement and vector index support. It also supports vector search indexes (ANN via HNSW) and distance functions like cosine and L2.

Vector Columns Beside Rows

Instead of splitting your world in two, you store the chunk text, the metadata you filter on (tenant, permissions), and the embedding vector in the same table.

Example schema (single-DB retrieval table):

CREATE TABLE rag_chunks (
id          BIGINT PRIMARY KEY,
tenant_id   BIGINT NOT NULL,
doc_id      BIGINT NOT NULL,
chunk_text  TEXT,
source      VARCHAR(128),
meta        JSON,  -- flexible metadata (tags, uri, permissions, etc.)
updated_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

-- 1536 is a common embedding dimension; use whatever your model outputs.
embedding   VECTOR(1536),

-- Build an ANN vector index (HNSW) for fast similarity search

VECTOR INDEX idx_embedding ((VEC_COSINE_DISTANCE(embedding)))
);

This ensures ACID consistency across your transactional and vector data, so when a row updates, the vector updates with it, eliminating sync drift.

Vector columns are first-class: they support dimension enforcement and an optimized storage format vs. dumping vectors into JSON. And vector search indexing is available across TiDB Self-Managed and TiDB Cloud deployment tiers (with minimum version guidance).

Figure 1. An example of a single TiDB table containing chunk text, metadata columns, and a VECTOR embedding column with an ANN index.

Hybrid Similarity + SQL Joins

By using an embedded vector database, you can perform similarity searches, tenant filtering, and permission checks with one query planner.

Basic vector similarity search (cosine distance):

SELECT id, doc_id, chunk_text,
VEC_COSINE_DISTANCE(embedding, '[0.1, 0.2, ...]') AS distance
FROM rag_chunks
ORDER BY distance
LIMIT 10;

This ORDER BY ... LIMIT pattern is how TiDB applies the vector index for ANN search.

Important gotcha (and how to handle it):

A pre-filter in WHERE can prevent the vector index from being used under SQL semantics. In TiDB, you can use the “KNN first, filter after” pattern.

SELECT *
FROM (
SELECT id, tenant_id, doc_id, chunk_text,
VEC_COSINE_DISTANCE(embedding, '[0.1, 0.2, ...]') AS distance
FROM rag_chunks
ORDER BY distance
LIMIT 50
) t
WHERE tenant_id = 42
ORDER BY distance
LIMIT 10;

Yes, you may return fewer than 10 if your post-filter removes results. But that is the tradeoff.

Optional upgrade: full-text + vector (hybrid search)

TiDB also offers full-text search with relevance ordering (BM25) and multilingual parsing, which is complementary to vector search in RAG scenarios.

Example: add a FULLTEXT index on chunk text:

ALTER TABLE rag_chunks
ADD FULLTEXT INDEX (chunk_text)
WITH PARSER MULTILINGUAL
ADD_COLUMNAR_REPLICA_ON_DEMAND;

Then you can retrieve exact-keyword matches (and join/filter normally) using fts_match_word().

Figure 2. Hybrid search flow where TiDB runs full-text and vector search, then results are combined and reranked.

Labor Showdown: Ops Hours & Incidents Count

Now let’s make this concrete. Below is a simple way to compare operational surface area. This is not a benchmark, but rather a workload checklist you can plug your own numbers into.

Workstream SQL DB + vector DB TiDB embedded vector DB
Provisioning Two clusters / services One cluster
Monitoring Two dashboards, two alert trees One
Data correctness Sync + drift detection Single write path
Backfills Dual write + index rebuild plans Table load + index build
Incident blast radius Query path splits across systems One query engine

Step-by-Step Guide

If you want a working RAG retrieval layer quickly, follow these steps.

Upgrade, Ingest, Query, Tune

Step 1: Upgrade / Install TiDB 8.5 (with Vector Search enabled)

TiDB Cloud (a fully-managed DBaaS offering of TiDB, fastest path):

  • Create a TiDB Cloud cluster on a plan that supports Vector Search.

TiDB Self-Managed (TiDB on-prem with full operational control, production-like):

  • Deploy TiDB 8.5 plus TiFlash nodes (ANN indexing relies on TiFlash).
  • Run a quick smoke test to prove vector types and functions are active:
CREATE TABLE vector_smoke_test (
id INT PRIMARY KEY,
v  VECTOR(3)
);

INSERT INTO vector_smoke_test VALUES (1, '[0.3, 0.5, -0.1]');
SELECT VEC_L2_DISTANCE(v, '[0.3, 0.5, -0.1]') FROM vector_smoke_test;

Step 2: Create tables with VECTOR(D) (pick your embedding dimension)

Pick D based on your embedding model’s output dimension. Use fixed dimension (VECTOR(D)) if you want an ANN index.

Recommended baseline schema (RAG-ready):

CREATE TABLE documents (
id        BIGINT PRIMARY KEY,
tenant_id BIGINT NOT NULL,
title     VARCHAR(255),
is_active TINYINT NOT NULL DEFAULT 1
);

CREATE TABLE rag_chunks (
chunk_id   BIGINT PRIMARY KEY,
tenant_id  BIGINT NOT NULL,
doc_id     BIGINT NOT NULL,
chunk_text TEXT NOT NULL,
embedding  VECTOR(1536) NOT NULL,  -- <- set to your model’s dimension
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

VECTOR INDEX idx_embedding ((VEC_L2_DISTANCE(embedding))) USING HNSW
);

To keep in mind: A vector index can only be built on one vector column and it cannot be a composite index with other columns.

If you add the vector index after table creation and the table has no TiFlash replica yet, create one first:

ALTER TABLE rag_chunks SET TIFLASH REPLICA 1;

Step 3: Ingest embeddings (TiDB Lightning or bulk INSERT)

You can insert vectors using the bracket string format: '[<float>, <float>, ...]'.

Option A: TiDB Lightning (recommended for big backfills)

  • Export chunks to CSV/Parquet with columns like:
    • chunk_id, tenant_id, doc_id, chunk_text, embedding
  • Store embedding as a string like '[0.01, 0.02, ...]'.
  • Use TiDB Lightning to load fast, then wait for indexes to finish building before benchmarking.

Option B: Bulk INSERT (simple + works everywhere)

  • Batch inserts (e.g., 1k–10k rows per batch) and keep writes idempotent:
INSERT INTO rag_chunks (chunk_id, tenant_id, doc_id, chunk_text, embedding)
VALUES
(1, 42, 1001, '...', '[0.01, 0.02, ...]'),
(2, 42, 1001, '...', '[0.03, 0.04, ...]');

Option C: Auto Embedding (fastest proof, no vectors required up front)

If you want to validate the end-to-end loop without generating embeddings in your app first, use Auto Embedding. You simply insert plain text, and TiDB generates and stores the vectors for you.

-- Create a table with auto-embedding (dimension must match the model)
CREATE TABLE documents (
id INT PRIMARY KEY AUTO_INCREMENT,
content TEXT,
content_vector VECTOR(1024) GENERATED ALWAYS AS (
EMBED_TEXT("tidbcloud_free/amazon/titan-embed-text-v2", content)
) STORED,
VECTOR INDEX ((VEC_L2_DISTANCE(content_vector))) USING HNSW
);

-- Insert text only (vectors are generated automatically)
INSERT INTO documents (content) VALUES
("Electric vehicles reduce air pollution in cities."),
("Solar panels convert sunlight into renewable energy."),
("Plant-based diets lower carbon footprints significantly.");

-- Search using plain text (TiDB embeds the query automatically)
SELECT id, content
FROM documents
ORDER BY VEC_EMBED_L2_DISTANCE(
content_vector,
"Renewable energy solutions for environmental protection"
)
LIMIT 3;

You can use Option C to get a working semantic retrieval loop in minutes. You can then switch to Option A/B when you want full control over your embedding pipeline, dimensions, models, and metadata joins.

Step 4: Build the RAG service (ANN via ORDER BY L2 distance + SQL joins)

Your service flow:

  1. Take user query text
  2. Generate query_embedding in the app
  3. Run ANN search (KNN via ORDER BY … LIMIT)
  4. Join to authoritative metadata (tenant, permissions, status)
  5. Return top chunks to the LLM

Core ANN query (L2 distance):

SELECT chunk_id, doc_id, chunk_text,
VEC_L2_DISTANCE(embedding, :qvec) AS distance
FROM rag_chunks
ORDER BY distance
LIMIT :k;

Step 5: Bench & tune: index build, recall, QPS, and tail latency

Before you benchmark:

  • Confirm the vector index is fully built after large loads (otherwise results look artificially slow).
  • Verify the index is being used with EXPLAIN ANALYZE.

Benchmarks to run (minimum set):

  • Latency: p50 / p95 / p99 for the retrieval query
  • Throughput: max sustainable QPS at acceptable p95
  • Recall: recall@k on a labeled eval set (or a “silver truth” set from historical clicks)

Primary tuning levers:

  • Candidate pool size: increase :cand_k for better recall; watch p95
  • Embedding dimension: lower dimensions reduce compute and index cost (tradeoff: accuracy)
  • Payload size: exclude vector columns from results; keep returned rows small
  • Warm-up behavior: run representative queries to warm the index before measuring p95
  • Index readiness: benchmark only after the system has caught up post-ingest/backfill

Operational Best Practices

Operating an embedded vector db follows many of the same patterns as standard TiDB administration.

Vector-Aware Metrics in TiDB Dashboard

Treat vector retrieval like any other latency-sensitive query path:

  • Watch query latency percentiles for the KNN query.
  • Watch TiFlash health and replication status if you depend on ANN indexes.
  • Use INFORMATION_SCHEMA.TIFLASH_INDEXES to validate index build completion after big ingests/backfills.

On-call tip: alert on “index build lag after ingest” the same way you’d alert on replica lag in other systems.

Online Re-Indexing & Auto-Scaling Recommendations

You will eventually change something:

  • A new embedding model (dimension change).
  • A new similarity function choice (cosine vs L2).
  • A new chunking strategy (data reshapes).

Plan for:

  • A backfill window.
  • Index build time.
  • Controlled rollouts by tenant or by doc source.

TiDB documents restrictions like “vector index needs a distance function,” “TiFlash must be deployed,” and “fixed dimensions are required for vector indexes.”

Full-Cluster Backup Including Vector Data

Operationally, treat embeddings as first-class data:

  • Back up the tables that store chunk text + vectors.
  • Practice restore drills that validate the retrieval path (not just row counts).
  • After restore, verify TiFlash replicas and vector indexes are healthy (and rebuild if your restore workflow requires it).

If you already run TiDB backup/restore with BR, keep your existing process and add a retrieval validation step to the runbook.

Proof in Production

In this section, you’ll see how teams are using TiDB Cloud + Vector Search to ship real AI workflows, reduce fragmentation, and keep critical analysis fast and reliable at scale.

New Retail Giant: 45% Lower Costs with a GraphRAG Knowledge System

A leading New Retail company needed a better way to search and apply internal knowledge across two high-pressure environments: IT operations (troubleshooting, maintenance, incident response) and restaurant operations (SOPs, safety protocols, recipes, training). Their documentation was scattered across sources, slowing resolution times and increasing operational friction.

They implemented a knowledge graph-based RAG (GraphRAG) system powered by TiDB Cloud and TiDB Vector Search to unify retrieval across content types and deliver contextual answers for both engineers and frontline staff.

Outcomes

  • Reduced documentation search time from ~30 minutes per query to seconds.
  • Improved retrieval accuracy by 10%.
  • Reduced training time by ~40%.
  • Drove a 70% decrease in support ticket resolution time.
  • Achieved 90% user adoption across technical and restaurant teams.
  • Delivered a 45% reduction in costs tied to maintaining fragmented knowledge bases.

Innovative SaaS Company: Real-Time Customer Feedback Intelligence with TiDB + AI

This SaaS company built an AI-powered feedback management platform that needed to ingest feedback from multiple channels, run semantic search, detect patterns, and surface actionable insights fast. Traditional analysis was too slow, and customers struggled to extract meaning from raw feedback, especially when they needed real-time response loops.

They used TiDB Cloud as the core database layer, pairing it with vector search to power advanced semantic retrieval and pattern discovery, with an AI-driven experience (dashboard + chatbot + analysis tools) on top.

Outcomes

  • 75% reduction in feedback analysis time.
  • 90% improvement in pattern recognition accuracy.
  • 99.9% system reliability.
  • 40% average increase in customer satisfaction rates.
  • Successfully processed 1M+ feedback entries.
  • 95% accuracy in AI-generated insights.

Next Steps

Ready to put this playbook into practice? Here are a couple ways you can take the next step with TiDB.

 

Spin Up a Free TiDB Vector Sandbox

Spin up a free TiDB Cloud Vector Sandbox and validate your RAG retrieval path end-to-end, without standing up another system. In minutes, you can load a small dataset, run real vector queries, and see how quickly you can get to stable p95 latency with one database.

Here’s what you’ll get:

  1. A working sandbox that stores chunks + metadata + embeddings in TiDB
  2. A repeatable set of vector similarity queries you can run and measure
  3. A simple checklist to validate topK, filters, and joins before you scale
  4. Screenshots and query results you can share with leadership to prove feasibility fast

Start for Free

Schedule an Embedded Vector DB Architecture Workshop

Schedule an Embedded Vector DB Architecture Workshop to turn this playbook into a concrete plan for your RAG stack. Alongside TiDB experts, we’ll review your data, embeddings, query patterns, and tenancy, then map how to unify vector search and SQL into one operable platform.

Here’s what you’ll get:

  1. A tailored RAG data model plus recommended index strategy and topK defaults
  2. A tenant isolation plan for RAG workloads
  3. A phased rollout plan with canaries, validation gates, and success metrics
  4. Updated runbooks for ingestion backfills, index rebuilds, and incident triage

Request a Workshop