{"id":30379,"date":"2025-09-26T00:08:09","date_gmt":"2025-09-26T07:08:09","guid":{"rendered":"http:\/\/dev-en.pingcap.com\/?post_type=tutorial&amp;p=28974"},"modified":"2025-09-26T00:08:09","modified_gmt":"2025-09-26T07:08:09","slug":"tutorial-building-semantic-search-applications-with-tidb-vector","status":"publish","type":"tutorial","link":"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/","title":{"rendered":"Tutorial: Building Semantic Search Applications with TiDB Vector"},"content":{"rendered":"<h1 class=\"wp-block-heading\">Tutorial: Building Semantic Search Applications with TiDB Vector<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_is_Semantic_Search\"><\/span>What is Semantic Search?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Semantic search uses mathematical representations called embeddings to understand the meaning of text. Unlike traditional keyword search, it can find related content even when the exact words don&#8217;t match.<\/p>\n\n\n\n<p>For example, searching for &#8220;slow application&#8221; can find documents about &#8220;performance optimization&#8221; and &#8220;database tuning&#8221; because they&#8217;re semantically related. This makes search much more intuitive and powerful.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Vector Search Example<\/h1>\n\n\n\n<p>This example demonstrates how to build a semantic search application using TiDB and built-in embedding models. It leverages vector search to find similar items based on meaning, not just keywords. The app uses Streamlit for the web UI and TiDB Cloud&#8217;s free embedding models.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Semantic Search &amp; RAG with TiDB\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/QdR-vpE6Vgg?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p><em>Semantic search with vector embeddings<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Prerequisites\"><\/span>Prerequisites<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Python 3.10+<\/strong><\/li>\n\n\n\n<li><strong>A TiDB Cloud Starter cluster<\/strong>: Create a free cluster here: <a href=\"https:\/\/tidbcloud.com\/?utm_source=github&amp;utm_medium=referral&amp;utm_campaign=pytidb_readme\">tidbcloud.com<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_to_run\"><\/span>How to run<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p><strong>Step 1<\/strong>: Clone the repository to local<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>git clone https:\/\/github.com\/pingcap\/pytidb.git\ncd pytidb\/examples\/vector_search\/<\/code><\/pre>\n\n\n\n<p><strong>Step 2<\/strong>: Install the required packages and set up the environment<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python -m venv .venv\nsource .venv\/bin\/activate\npip install -r reqs.txt<\/code><\/pre>\n\n\n\n<p><strong>Step 3<\/strong>: Set up environment to connect to TiDB<\/p>\n\n\n\n<p>Go to <a href=\"https:\/\/tidbcloud.com\/clusters\">TiDB Cloud console<\/a> and get the connection parameters, then set up the environment variable like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cat &gt; .env &lt;&lt;EOF\nTIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\nTIDB_PORT=4000\nTIDB_USERNAME={prefix}.root\nTIDB_PASSWORD={password}\nTIDB_DATABASE=vector_search_example\nEOF<\/code><\/pre>\n\n\n\n<p><strong>Step 4<\/strong>: Run the Streamlit app<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>streamlit run app.py<\/code><\/pre>\n\n\n\n<p><strong>Step 5<\/strong>: Open your browser and visit <code>http:\/\/localhost:8501<\/code><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_It_Works\"><\/span>How It Works<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>\ud83d\udca1 <strong>Source Code<\/strong>: You can find the complete source code for this example on <a href=\"https:\/\/github.com\/pingcap\/pytidb\/tree\/main\/examples\/vector_search\">GitHub<\/a>. This working example includes all the necessary files to get you started with semantic search in minutes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Embedding Function<\/h3>\n\n\n\n<p>Configure automatic embedding generation with TiDB Cloud&#8217;s free embedding models:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from pytidb import EmbeddingFunction\n\ntext_embed = EmbeddingFunction(\n    model_name=\"tidbcloud_free\/cohere\/embed-multilingual-v3\",\n)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">2. Schema Definition<\/h3>\n\n\n\n<p>Define a table schema with text and vector fields using <code>pytidb.schema.TableModel<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from pytidb import Field, TableModel\nfrom sqlalchemy import JSON\n\nclass Chunk(TableModel):\n    __tablename__ = \"chunks\"\n    __table_args__ = {\"extend_existing\": True}\n\n    id: int = Field(primary_key=True)\n    text: str = Field()\n    text_vec: list&#91;float] = text_embed.VectorField(\n        source_field=\"text\",\n    )\n    meta: dict = Field(sa_type=JSON)\n\ntable = db.create_table(schema=Chunk, if_exists=\"overwrite\")<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">3. Insert Sample Data<\/h3>\n\n\n\n<p>Insert text data with automatic embedding generation:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sample_chunks = &#91;\n    {\n        \"text\": \"Big data analytics extracts insights from large datasets.\",\n        \"meta\": {\"language\": \"english\"},\n    },\n    {\n        \"text\": \"Internet of Things connects everyday objects to the internet.\",\n        \"meta\": {\"language\": \"english\"},\n    },\n    {\n        \"text\": \"Augmented reality overlays digital content on the real world.\",\n        \"meta\": {\"language\": \"english\"},\n    },\n]\n\nChunk = table.table_model\ntable.bulk_insert(&#91;Chunk(**chunk) for chunk in sample_chunks])<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">4. Vector Search with Filtering<\/h3>\n\n\n\n<p>Use the search API to find semantically similar content with distance thresholds and metadata filters:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>results = (\n    table.search(query_text)\n    .debug(True)\n    .filter({\"meta.language\": language})\n    .distance_threshold(distance_threshold)\n    .limit(query_limit)\n    .to_list()\n)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">5. Results<\/h3>\n\n\n\n<p>Get ranked results with similarity scores displayed in an interactive Streamlit interface, where closer vectors indicate more semantically similar content.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Related_Resources\"><\/span>Related Resources<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Source Code<\/strong>: <a href=\"https:\/\/github.com\/pingcap\/pytidb\/tree\/main\/examples\/vector_search\">View on GitHub<\/a><\/li>\n\n\n\n<li><strong>TiDB Vector Documentation<\/strong>: <a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\/vector-search-overview\">Vector Data Types<\/a><\/li>\n\n\n\n<li><strong>Ollama Models<\/strong>: <a href=\"https:\/\/ollama.com\/library\">Available Embedding Models<\/a><\/li>\n\n\n\n<li><strong>Hands-on Lab<\/strong>: <a href=\"https:\/\/labs.tidb.io\/lab?preview=demo_408\">Build Simple Vector Search Apps Using Jupyter Notebook<\/a> (60 min)<\/li>\n<\/ul>\n\n\n\n<p>Ready to implement semantic search in your application? Start with the complete example in the GitHub repository and see how TiDB&#8217;s vector search can transform your user experience.<\/p>\n\n\n\n<p><\/p>","protected":false},"excerpt":{"rendered":"<p>Tutorial: Building Semantic Search Applications with TiDB Vector What is Semantic Search? Semantic search uses mathematical representations called embeddings to understand the meaning of text. Unlike traditional keyword search, it can find related content even when the exact words don&#8217;t match. For example, searching for &#8220;slow application&#8221; can find documents about &#8220;performance optimization&#8221; and &#8220;database [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","categories":[],"tags":[],"class_list":["post-30379","tutorial","type-tutorial","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Tutorial: Building Semantic Search Applications with TiDB Vector | TiDB<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Tutorial: Building Semantic Search Applications with TiDB Vector | TiDB\" \/>\n<meta property=\"og:description\" content=\"Tutorial: Building Semantic Search Applications with TiDB Vector What is Semantic Search? Semantic search uses mathematical representations called embeddings to understand the meaning of text. Unlike traditional keyword search, it can find related content even when the exact words don&#8217;t match. For example, searching for &#8220;slow application&#8221; can find documents about &#8220;performance optimization&#8221; and &#8220;database [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1440\" \/>\n\t<meta property=\"og:image:height\" content=\"714\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/\",\"url\":\"https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/\",\"name\":\"Tutorial: Building Semantic Search Applications with TiDB Vector | TiDB\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"datePublished\":\"2025-09-26T07:08:09+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Tutorial: Building Semantic Search Applications with TiDB Vector\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Tutorial: Building Semantic Search Applications with TiDB Vector | TiDB","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/","og_locale":"ko_KR","og_type":"article","og_title":"Tutorial: Building Semantic Search Applications with TiDB Vector | TiDB","og_description":"Tutorial: Building Semantic Search Applications with TiDB Vector What is Semantic Search? Semantic search uses mathematical representations called embeddings to understand the meaning of text. Unlike traditional keyword search, it can find related content even when the exact words don&#8217;t match. For example, searching for &#8220;slow application&#8221; can find documents about &#8220;performance optimization&#8221; and &#8220;database [&hellip;]","og_url":"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","og_image":[{"width":1440,"height":714,"url":"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@PingCAP","twitter_misc":{"Est. reading time":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/","url":"https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/","name":"Tutorial: Building Semantic Search Applications with TiDB Vector | TiDB","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"datePublished":"2025-09-26T07:08:09+00:00","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/tutorial\/tutorial-building-semantic-search-applications-with-tidb-vector\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"Tutorial: Building Semantic Search Applications with TiDB Vector"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]}]}},"_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tutorial\/30379","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tutorial"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/tutorial"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=30379"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/categories?post=30379"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tags?post=30379"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}