Build a Semantic Cache Service with Jina AI Embedding and TiDB

In the rapidly evolving landscape of machine learning and database technologies, combining the strengths of different tools can lead to innovative solutions. One such powerful combination is using Jina AI’s embedding capabilities with TiDB’s vector search functionality. This blog will guide you through building a semantic cache service using Jina AI Embeddings and TiDB Vector.

What is a Semantic Cache?

A semantic cache stores the results of expensive queries and reuses them when the same or similar queries are made. This type of cache uses semantic understanding rather than exact key matching, making it particularly useful in applications requiring natural language processing or similar complex data retrieval tasks.

Why Jina AI and TiDB?

Jina AI: Provides robust embedding capabilities, converting text into high-dimensional vectors that capture semantic meaning.
TiDB Vector: Extends the TiDB database to support efficient vector operations, enabling fast similarity searches on high-dimensional data.

Setting Up the Environment

Prerequisites

Ensure you have the following installed:

Python 3.8 or higher
TiDB Serverless cluster setup and running
An API key from Jina AI

Step-by-Step Implementation

1.Configuration

First, set up your environment configuration. Create a .env file to store your database URI and TTL (Time to Live) settings.

DATABASE_URI=mysql+pymysql://<username>:<password>@<host>:<port>/<database>?ssl_mode=VERIFY_IDENTITY&ssl_ca=/etc/ssl/cert.pem
TIME_TO_LIVE=604800  # Default is 1 week

2.Install Required Libraries

Install the necessary Python packages:

pip install fastapi requests sqlmodel sqlalchemy python-dotenv tidb-vector

3.Define the Cache Model

Use SQLModel to define your cache model, incorporating vector fields and automatic timestamping.

from sqlmodel import SQLModel, Field, Column, DateTime, String, Text
from sqlalchemy import func
from tidb_vector.sqlalchemy import VectorType
from typing import Optional
from datetime import datetime

class Cache(SQLModel, table=True):
    __table_args__ = {
        # Setting the TTL (Time to Live) for the cache entries
        'mysql_TTL': f'created_at + INTERVAL {TIME_TO_LIVE} SECOND',
    }

    id: Optional[int] = Field(default=None, primary_key=True)
    key: str = Field(sa_column=Column(String(255), unique=True, nullable=False))
    key_vec: Optional[list[float]] = Field(
        sa_column=Column(
            VectorType(768),  # Define the vector type with 768 dimensions
            default=None,
            comment="hnsw(distance=l2)",  # Using HNSW (Hierarchical Navigable Small World) algorithm for distance calculation
            nullable=False,
        )
    )
    value: Optional[str] = Field(sa_column=Column(Text))
    created_at: datetime = Field(
        sa_column=Column(DateTime, server_default=func.now(), nullable=False)
    )
    updated_at: datetime = Field(
        sa_column=Column(DateTime, server_default=func.now(), onupdate=func.now(), nullable=False)
    )

4.Create the Database Engine

Create the engine and the database schema.

from sqlmodel import create_engine

# Create the engine using the database URI
engine = create_engine(DATABASE_URI)
# Create all tables in the database
SQLModel.metadata.create_all(engine)

5.FastAPI Setup

Set up the FastAPI application and endpoints for setting and getting cache entries.

from fastapi import FastAPI, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from sqlmodel import Session, select

# Initialize FastAPI app
app = FastAPI()
security = HTTPBearer()

@app.post("/set")
def set_cache(
    credentials: HTTPAuthorizationCredentials = Depends(security),
    cache: Cache
):
    # Generate embeddings for the given key using Jina AI
    cache.key_vec = generate_embeddings(credentials.credentials, cache.key)
    with Session(engine) as session:
        session.add(cache)
        session.commit()
    return {'message': 'Cache has been set'}

@app.get("/get/{key}")
def get_cache(
    credentials: HTTPAuthorizationCredentials = Depends(security),
    key: str,
    max_distance: Optional[float] = 0.1,
):
    # Generate embeddings for the given key using Jina AI
    key_vec = generate_embeddings(credentials.credentials, key)
    # The max value of distance is 0.3
    max_distance = min(max_distance, 0.3)

    with Session(engine) as session:
        result = session.exec(
            select(
                Cache,
                Cache.key_vec.cosine_distance(key_vec).label('distance')
            ).order_by(
                'distance'
            ).limit(1)
        ).first()

        if result is None:
            return {"message": "Cache not found"}, 404

        cache, distance = result
        if distance > max_distance:
            return {"message": "Cache not found"}, 404

        return {
            "key": cache.key,
            "value": cache.value,
            "distance": distance
        }

6.Generate Embeddings

Implement a function to get embeddings from Jina AI.

import requests
import os
from dotenv import load_dotenv

load_dotenv()

def generate_embeddings(jinaai_api_key: str, text: str):
    JINAAI_API_URL = 'https://api.jina.ai/v1/embeddings'
    JINAAI_HEADERS = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {jinaai_api_key}'
    }
    JINAAI_REQUEST_DATA = {
        'input': [text],
        'model': 'jina-embeddings-v2-base-en'  # Use the Jina Embeddings model with 768 dimensions
    }
    response = requests.post(JINAAI_API_URL, headers=JINAAI_HEADERS, json=JINAAI_REQUEST_DATA)
    # Extract and return the embedding from the response
    return response.json()['data'][0]['embedding']

How to Use This App

Prerequisites

A running TiDB Serverless cluster with vector search enabled
Python 3.8 or later
Jina AI API key from Jina AI

Run the example

1.Clone this repo

git clone https://github.com/pingcap/tidb-vector-python.git

2.Create a virtual environment

cd tidb-vector-python/examples/semantic-cache
python3 -m venv .venv
source .venv/bin/activate

3.Install dependencies

pip install -r requirements.txt

4.Set the environment variables

Get the HOST, PORT, USERNAME, PASSWORD, and DATABASE from the TiDB Cloud console, as described in the [Prerequisites](../README.md#prerequisites) section. Then set the following environment variables:

export DATABASE_URI="mysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true"

or create a .env file with the above environment variables.

5.Run this example

Start the semantic cache server

uvicorn cache:app --reload

6.Test the API

Get the Jina AI API key from the Jina AI Embedding API page, and save it somewhere safe for later use.

POST /set

curl --location ':8000/set' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <your jina token>' \
--data '{
    "key": "what is tidb",
    "value": "tidb is a mysql-compatible and htap database"
}'

GET /get/<key>

curl --location ':8000/get/what%27s%20tidb%20and%20tikv?max_distance=0.5' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <your jina token>'

Conclusion

By combining Jina AI’s powerful embedding capabilities with TiDB’s efficient vector operations, you can build a robust semantic cache service. This service is ideal for applications requiring fast, intelligent caching and retrieval of semantically similar data. Start experimenting with this setup to explore its full potential in your projects.

More Demos

There are some examples to show how to use the tidb-vector-python to interact with TiDB Vector in different scenarios.

OpenAI Embedding: use the OpenAI embedding model to generate vectors for text data, store them in TiDB Vector, and search for similar text.
Image Search: use the OpenAI CLIP model to generate vectors for image and text, store them in TiDB Vector, and search for similar images.
LlamaIndex RAG with UI: use the LlamaIndex to build an RAG(Retrieval-Augmented Generation) application.
Chat with URL: use LlamaIndex to build an RAG(Retrieval-Augmented Generation) application that can chat with a URL.
GraphRAG: 20 lines code of using TiDB Serverless to build a Knowledge Graph based RAG application.
GraphRAG Step by Step Tutorial: Step by step tutorial to build a Knowledge Graph based RAG application with Colab notebook. In this tutorial, you will learn how to extract knowledge from a text corpus, build a Knowledge Graph, store the Knowledge Graph in TiDB Serverless, and search from the Knowledge Graph.
Vector Search Notebook with SQLAlchemy: use SQLAlchemy to interact with TiDB Serverless: connect db, index&store data and then search vectors.
Build RAG with Jina AI Embeddings: use Jina AI to generate embeddings for text data, store the embeddings in TiDB Vector Storage, and search for similar embeddings.

Happy coding!

Last updated June 23, 2024

Table of Contents

Spin up a Serverless database with 25GiB free resources.

Start Right Away

Product

Build a Semantic Cache Service with Jina AI Embedding and TiDB Vector