Hybrid Search for RAG: Fix Retrieval Accuracy in AI

Editor’s note: This post originally appeared on The New Stack and is republished with permission. The original version is available 여기.

Key Takeaways

Semantic similarity isn’t factual correctness. Stale or out-of-scope docs still look “similar”, and that gap is where RAG breaks.

The missing data lives in columns, not vectors. This is a database problem, not an embeddings one.

Hybrid search combines vector and SQL filters in one query, with 94% vs 72% recall for ~15–30ms.

Skip the “vector sidecar.” A separate vector database means sync gaps and duplicated ACLs. One database removes them.

A few months ago, one of our users filed a bug report that stuck with me. They had built a customer-support agent on top of a RAG pipeline. A user asked whether they could return a laptop purchased three weeks earlier. The agent retrieved a return policy document, quoted a 30-day window, and told the customer to ship it back. Perfectly confident answer. Completely wrong.

The document was real. It just happened to be from 2023. The current policy had been updated to a 14-day window for electronics. Vector similarity has no notion of recency or scope as cosine distance between the query embedding and the 2023 policy was excellent. Why wouldn’t it be? The words were almost identical.

When I dug into this, I realized it wasn’t really a bug. It was an architectural problem, one that I’d been seeing more and more as teams moved RAG from prototype to production. And it changed how I think about what a database needs to do in the AI era.

The Gap Nobody Talks About

We’ve spent the last two years as an industry fixated on hallucination. RAG was the answer: Ground the model in real documents. And it works. But somewhere along the way, we started treating retrieval as a solved problem. It isn’t.

Here’s the thing I keep coming back to: Semantic similarity and factual correctness are not the same thing. A vector search finds documents that are close in meaning to your query. That’s useful, but “close in meaning” doesn’t mean “correct for this context.” A deprecated policy is semantically similar to the current one. A document scoped to enterprise customers is semantically similar to a query from a free-tier user. A confidential document in tenant A’s namespace is semantically similar to a query from tenant B.

I call this the retrieval accuracy gap. The distance between what vector similarity thinks is relevant and what your application actually needs to be correct. And you can’t close it with better embeddings. The missing information—timestamps, scopes, permissions—is structured data. It lives in columns, not in vector space.

This is a database problem.

What I Mean by Hybrid Search

When I say hybrid search, I mean something specific: a single database query that combines vector similarity with structured SQL predicates. Not a two-phase pipeline where you do a vector search, get back 100 candidates, and then filter in application code. A single query, optimized holistically by the database engine.

The difference matters more than it sounds. When filtering happens in application code, you’re doing the expensive work—scanning the full vector index—before applying the cheap constraints. That’s backwards. A database that understands both vector and relational operations can use selectivity estimates to decide whether to filter first or scan first. It’s the same query planning logic we’ve had in relational databases for decades. It just needs to extend to vector indexes.

Let me show you what this looks like concretely. Assume a schema like this:

CREATE TABLE documents (
  id         BIGINT PRIMARY KEY AUTO_INCREMENT,
  content    TEXT,
  embedding  VECTOR(1536),
  team_id    BIGINT NOT NULL,
  doc_type   VARCHAR(50),
  updated_at DATETIME NOT NULL,
  status     ENUM('active','deprecated','draft'),
  INDEX idx_embedding USING HNSW (embedding),
  INDEX idx_team_status (team_id, status)
);

Nothing exotic. Standard relational schema with a vector column. Now here are three query patterns that solve the failure modes I described.

Pattern 1: Recency Filtering

The stale-document problem disappears when you add a time constraint:

SELECT id, content,
       VEC_COSINE_DISTANCE(embedding, @query_vec) AS distance
FROM documents
WHERE status = 'active'
  AND updated_at >= NOW() - INTERVAL 90 DAY
ORDER BY distance
LIMIT 5;

The WHERE clause prunes the candidate set before the vector scan. In a 10-million-row corpus, this typically eliminates 60–80% of the rows. The database gets faster 그리고 more accurate at the same time. That’s the kind of tradeoff I like.

Pattern 2: Tenant Isolation via Join

This is the pattern I worry about most, because when it fails, it’s a security incident, not just a bad answer:

SELECT d.id, d.content,
 VEC_COSINE_DISTANCE(d.embedding, @query_vec) AS distance
FROM documents d
JOIN user_permissions p
 ON p.team_id = d.team_id
WHERE p.user_id = @current_user
 AND d.status = 'active'
ORDER BY distance
LIMIT 5;

A relational join against a permissions table. No matter how semantically similar a document is to the query, the user never sees content outside their scope. The constraint is enforced by the database engine, not by application code that someone might forget to update.

Try doing this with a standalone vector database. You’d have to duplicate your entire ACL into metadata tags, re-index every time permissions change, and hope your tag-based filtering handles the combinatorial explosion of permission groups. I’ve watched teams try this. It doesn’t end well.

Pattern 3: Category Ranking with Aggregation

Sometimes the right answer isn’t a single document but a pattern across many documents. This query groups matches by type and finds where the signal is densest:

SELECT d.doc_type,
       COUNT(*)                                  AS match_count,
       MIN(VEC_COSINE_DISTANCE(d.embedding, @query_vec)) AS best_dist,
       GROUP_CONCAT(d.id ORDER BY
         VEC_COSINE_DISTANCE(d.embedding, @query_vec)) AS doc_ids
FROM documents d
WHERE d.status = 'active'
  AND VEC_COSINE_DISTANCE(d.embedding, @query_vec) < 0.3
GROUP BY d.doc_type
ORDER BY match_count DESC, best_dist ASC
LIMIT 3;

This tells the LLM: “The answer is most likely in the FAQ documents (7 matches) rather than the blog posts (2 matches).” Then you retrieve the top documents from the winning category. It’s a GROUP BY. Vector databases can’t do GROUP BY. This is relational algebra, and it changes retrieval quality dramatically when your corpus contains overlapping document types.

What the Numbers Look Like

We modeled both approaches after production workloads against a 10-million-row enterprise knowledge base. This includes 18 months of content, mixed document types, and 500 queries with human-labeled ground truth. Here’s what we found:

Metric	Pure Vector (top-5)	Hybrid (top-5)
Recall@5	72%	94%
Precision@5	58%	87%
Stale doc in top 5	23%	< 1%
Cross-tenant leak rate	8%	0%
p50 latency	45 ms	62 ms
p99 latency	120 ms	155 ms

The latency cost is 15–30 milliseconds. Invisible to the user. The zero cross-tenant leak rate isn’t a statistical improvement as it’s a guarantee enforced by a relational join. That’s the kind of property you can bring to a security review.

What I find interesting is that hybrid search is actually faster in many real-world cases, because the structured filters reduce the vector search space so dramatically. When 70% of your corpus is pruned before the vector scan even starts, the wall-clock time drops. Please keep in mind that results will vary depending on corpus distribution and filtering selectivity.

The “Vector Sidecar” Anti-Pattern

I want to talk about an architectural pattern I see constantly, because I think it’s the root cause of most RAG quality problems in production.

The pattern goes like this: you have a primary database—usually MySQL or PostgreSQL—where your application data lives. Then you stand up a separate vector database for embeddings. Now you need a sync pipeline to keep them consistent. Every document insert, update, and deletion has to propagate to both systems. You’re maintaining two schemas, two connection pools, two monitoring dashboards, and a fragile ETL job in between.

I call this the vector sidecar, and it creates three problems that compound over time:

Consistency windows. There’s always a gap where the two systems disagree. A document can be marked as deprecated in your primary database but still gets returned by the vector store until the sync catches up. In the return-policy example, this is exactly what happened—the policy was updated in the primary database, but the vector index was stale.
No cross-system joins. You can’t join your ACL table against your vector index in a single query. So you end up duplicating permission data as metadata tags, which means every permission change requires a re-index. At scale, this gets expensive and error-prone.
Double the operational surface. Two databases means two sets of on-call rotations, two capacity planning models, two failure modes. I’ve been building distributed systems for over a decade, and the single most effective way to improve reliability is to reduce the number of moving parts.

The alternative is straightforward: Put vectors and structured data in the same database. One connection string, transaction boundary, and consistency model. The database handles the query planning, deciding whether to scan the vector index first or filter first based on selectivity estimates.

This is one of the reasons we built vector support directly into 티DB. When we started seeing our users run into these exact problems—consistency bugs, cross-tenant leaks, operational complexity—the answer wasn’t a better sync pipeline. It was eliminating the need for a sync pipeline entirely.

Why SQL Compatibility Matters Here

There’s a practical dimension to this that I think is underappreciated. When we made the decision years ago to implement the MySQL wire protocol in TiDB, it was about reducing adoption friction. But in the AI era, it turns out to have a deeper benefit.

SQL is the lingua franca of application development. Every ORM speaks it, every connection pool supports it, and every engineer on your team has written SQL queries. When your AI database speaks the same protocol, the hybrid search patterns I described above aren’t exotic—they’re just SQL queries. Your team doesn’t need to learn a new query language, a new client library, or a new operational model. They write the same SQL they’ve always written, with a vector distance function added.

I’ve learned from watching thousands of TiDB deployments that adoption barriers matter more than feature lists. The best architecture is the one your team can actually ship.

When You Don’t Need Hybrid Search

I want to be honest about where pure vector search is perfectly fine, because I think the credibility of any technical recommendation depends on acknowledging its limits.

Single-tenant, single-document-type corpora. If you’re building a knowledge base search over one product’s documentation for one team, pure vector search with a good embedding model will serve you well. The failure modes I described arise from heterogeneity, or multiple tenants, multiple document types, multiple time horizons.
Exploratory or creative use cases. If the user is brainstorming—“find me something related to this idea”—approximate retrieval is actually what you want. Strict correctness isn’t the goal.
Human-in-the-loop workflows. If a human reviews every result before it’s acted on, the cost of an occasional stale document is manageable. The stakes change when agents take actions autonomously.

If correctness is optional, vectors are enough; if correctness is required, they aren’t. But the moment you have multiple tenants, documents that expire, or any scenario where a wrong retrieval leads to a wrong action without human review, you need hybrid search. For most production RAG systems, that’s day one.

The Middle Layer

I’ve been thinking a lot about where the real leverage is in the AI stack. The industry has invested enormous energy in two layers: the embedding model (how well do we encode meaning?) and the generation model (how well do we synthesize answers?). Both are important. But between them sits a third layer that’s been treated as a commodity: the database query that actually retrieves context.

This middle layer is where correctness lives. The embedding model turns a question into a vector. The generation model turns documents into an answer. But the retrieval query is what decides which documents the model sees. Get this wrong, and everything downstream is wrong too, no matter how good your embedding or your LLM.

Hybrid search—combining vector similarity with relational filters in a single query, in a single database—is how you close the retrieval accuracy gap. It’s not a sophisticated technique. It’s standard SQL with a distance function. The only prerequisite is a database that doesn’t force you to choose between vectors and relations.

When we started building TiDB more than a decade ago, our thesis was that the database should adapt to the application, not the other way around. That meant MySQL compatibility so you didn’t have to rewrite your app. It meant horizontal scalability so you didn’t have to re-architect at growth inflection points. And now it means native vector support so you don’t have to bolt on a separate system to build AI features.

The thesis hasn’t changed. The applications have.

Try hybrid search in TiDB and spin up a free cluster to run vector and SQL in a single query.

시작하기