{"id":30378,"date":"2025-09-24T23:26:15","date_gmt":"2025-09-25T06:26:15","guid":{"rendered":"http:\/\/dev-en.pingcap.com\/?post_type=tutorial&amp;p=28971"},"modified":"2025-09-24T23:26:15","modified_gmt":"2025-09-25T06:26:15","slug":"tutorial-auto-embedding-with-built-in-models","status":"publish","type":"tutorial","link":"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-auto-embedding-with-built-in-models\/","title":{"rendered":"Tutorial: Auto Embedding with Built-in Models"},"content":{"rendered":"<h1 class=\"wp-block-heading\">Tutorial: Auto Embedding with Built-in Models<\/h1>\n\n\n\n<p>Learn how to automatically generate embeddings for your text data using TiDB&#8217;s built-in embedding models with PyTiDB SDK.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-auto-embedding\"><span class=\"ez-toc-section\" id=\"What_is_Auto_Embedding\"><\/span>What is Auto Embedding?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Auto embedding automatically generates vector representations of your text data when inserting it into the database. This eliminates the need to manually generate embeddings, making it easier to build AI applications with semantic search capabilities.<\/p>\n\n\n\n<p>With auto embedding, you can insert plain text and let TiDB handle the embedding generation in the background. When you search, the query text is also automatically embedded, making the entire process seamless.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Auto Embedding with TiDB\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/j9MTm28aNwo?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"prerequisites\"><span class=\"ez-toc-section\" id=\"Prerequisites\"><\/span>Prerequisites<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Python 3.10+<\/strong><\/li>\n\n\n\n<li><strong>A TiDB Cloud Starter cluster<\/strong>: Create a free cluster here: <a href=\"https:\/\/tidbcloud.com\/?utm_source=github&amp;utm_medium=referral&amp;utm_campaign=pytidb_readme\">tidbcloud.com<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-to-run\"><span class=\"ez-toc-section\" id=\"How_to_run\"><\/span>How to run<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p><strong>Step 1<\/strong>: Clone the repository to local<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>git clone https:\/\/github.com\/pingcap\/pytidb.git\ncd pytidb\/examples\/auto_embedding\/\n<\/code><\/pre>\n\n\n\n<p><strong>Step 2<\/strong>: Install the required packages and set up the environment<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python -m venv .venv\nsource .venv\/bin\/activate\npip install -r reqs.txt\n<\/code><\/pre>\n\n\n\n<p><strong>Step 3<\/strong>: Set up environment to connect to TiDB<\/p>\n\n\n\n<p>Go to <a href=\"https:\/\/tidbcloud.com\/clusters\">TiDB Cloud console<\/a> and get the connection parameters, then set up the environment variable like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cat &gt; .env &lt;&lt;EOF\nTIDB_HOST={gateway-region}.prod.aws.tidbcloud.com\nTIDB_PORT=4000\nTIDB_USERNAME={prefix}.root\nTIDB_PASSWORD={password}\nTIDB_DATABASE=auto_embedding_demo\n\n# Using TiDB Cloud Free embedding model by default\nEMBEDDING_PROVIDER=tidbcloud_free\nEOF\n<\/code><\/pre>\n\n\n\n<p><strong>Step 4<\/strong>: Run the demo<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python main.py\n<\/code><\/pre>\n\n\n\n<p><strong>Step 5<\/strong>: Expected output<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>=== Define embedding function ===\nEmbedding function (model id: tidbcloud_free\/amazon\/titan-embed-text-v2) defined\n\n=== Define table schema ===\nTable created\n\n=== Insert sample data ===\nInserted 3 chunks\n\n=== Perform vector search ===\nid: 1, text: TiDB is a distributed database..., distance: 0.303\nid: 2, text: PyTiDB is a Python library..., distance: 0.422\nid: 3, text: LlamaIndex is a Python library..., distance: 0.526\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-it-works\"><span class=\"ez-toc-section\" id=\"How_It_Works\"><\/span>How It Works<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>\ud83d\udca1 <strong>Source Code<\/strong>: You can find the complete source code for this example on <a href=\"https:\/\/github.com\/pingcap\/pytidb\/tree\/main\/examples\/auto_embedding\">GitHub<\/a>. This working example includes all the necessary files to get you started with auto embedding in minutes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1-define-embedding-function\">1. Define Embedding Function<\/h3>\n\n\n\n<p>Configure the embedding model that will automatically generate vectors for your text data:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from pytidb import EmbeddingFunction\n\n# Define embedding function\nembed_func = EmbeddingFunction(\n    model_name=\"tidbcloud_free\/amazon\/titan-embed-text-v2\"\n    # No API key required for TiDB Cloud free models\n)\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"2-schema-definition\">2. Schema Definition<\/h3>\n\n\n\n<p>Define a table schema with auto-embedding vector fields using <code>pytidb.schema.TableModel<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from pytidb import Field, TableModel\nfrom sqlalchemy import TEXT\n\nclass Chunk(TableModel):\n    __tablename__ = \"chunks\"\n    __table_args__ = {\"extend_existing\": True}\n\n    id: int = Field(primary_key=True)\n    text: str = Field(sa_type=TEXT)\n    text_vec: list&#91;float] = embed_func.VectorField(source_field=\"text\")\n\ntable = db.create_table(schema=Chunk, if_exists=\"overwrite\")\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3-automatic-data-insertion\">3. Automatic Data Insertion<\/h3>\n\n\n\n<p>Insert text data and embeddings are generated automatically in the background:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Insert sample data - embeddings generated automatically\ntable.bulk_insert(&#91;\n    Chunk(text=\"TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads.\"),\n    Chunk(text=\"PyTiDB is a Python library for developers to connect to TiDB.\"),\n    Chunk(text=\"LlamaIndex is a Python library for building AI-powered applications.\"),\n])\n\nprint(\"\u2705 Text inserted, embeddings generated automatically!\")\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4-seamless-vector-search\">4. Seamless Vector Search<\/h3>\n\n\n\n<p>Search using natural language queries &#8211; embedding happens automatically:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Search with natural language (query embedding happens automatically)\nresults = (\n    table.search(\"database for AI applications\")\n    .distance_threshold(0.8)\n    .limit(5)\n    .to_pandas()\n)\n\nfor _, row in results.iterrows():\n    print(f\"\ud83d\udcc4 {row&#91;'text']} (distance: {row&#91;'_distance']:.3f})\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"related-resources\"><span class=\"ez-toc-section\" id=\"Related_Resources\"><\/span>Related Resources<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Source Code<\/strong>: <a href=\"https:\/\/github.com\/pingcap\/pytidb\/tree\/main\/examples\/auto_embedding\">View on GitHub<\/a><\/li>\n\n\n\n<li><strong>TiDB Vector Documentation<\/strong>: <a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\/vector-search-overview\">Vector Data Types<\/a><\/li>\n\n\n\n<li><strong>PyTiDB Documentation<\/strong>: <a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\/vector-search-embedding-models\">Auto Embedding Guide<\/a><\/li>\n\n\n\n<li><strong>Hands-on Lab<\/strong>: <a href=\"https:\/\/labs.tidb.io\/lab?preview=demo_410\">Build Auto-Embedding Apps Using Jupyter Notebook<\/a> (45 min)<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Tutorial: Auto Embedding with Built-in Models Learn how to automatically generate embeddings for your text data using TiDB&#8217;s built-in embedding models with PyTiDB SDK. What is Auto Embedding? Auto embedding automatically generates vector representations of your text data when inserting it into the database. This eliminates the need to manually generate embeddings, making it easier [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","categories":[],"tags":[29],"class_list":["post-30378","tutorial","type-tutorial","status-publish","hentry","tag-tutorial"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Tutorial: Auto Embedding with Built-in Models | TiDB<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-auto-embedding-with-built-in-models\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Tutorial: Auto Embedding with Built-in Models | TiDB\" \/>\n<meta property=\"og:description\" content=\"Tutorial: Auto Embedding with Built-in Models Learn how to automatically generate embeddings for your text data using TiDB&#8217;s built-in embedding models with PyTiDB SDK. What is Auto Embedding? Auto embedding automatically generates vector representations of your text data when inserting it into the database. This eliminates the need to manually generate embeddings, making it easier [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-auto-embedding-with-built-in-models\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1440\" \/>\n\t<meta property=\"og:image:height\" content=\"714\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/\",\"url\":\"https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/\",\"name\":\"Tutorial: Auto Embedding with Built-in Models | TiDB\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"datePublished\":\"2025-09-25T06:26:15+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Tutorial: Auto Embedding with Built-in Models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Tutorial: Auto Embedding with Built-in Models | TiDB","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-auto-embedding-with-built-in-models\/","og_locale":"ko_KR","og_type":"article","og_title":"Tutorial: Auto Embedding with Built-in Models | TiDB","og_description":"Tutorial: Auto Embedding with Built-in Models Learn how to automatically generate embeddings for your text data using TiDB&#8217;s built-in embedding models with PyTiDB SDK. What is Auto Embedding? Auto embedding automatically generates vector representations of your text data when inserting it into the database. This eliminates the need to manually generate embeddings, making it easier [&hellip;]","og_url":"https:\/\/www.pingcap.com\/ko\/tutorial\/tutorial-auto-embedding-with-built-in-models\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","og_image":[{"width":1440,"height":714,"url":"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@PingCAP","twitter_misc":{"Est. reading time":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/","url":"https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/","name":"Tutorial: Auto Embedding with Built-in Models | TiDB","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"datePublished":"2025-09-25T06:26:15+00:00","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/tutorial\/tutorial-auto-embedding-with-built-in-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"Tutorial: Auto Embedding with Built-in Models"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]}]}},"_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tutorial\/30378","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tutorial"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/tutorial"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=30378"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/categories?post=30378"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tags?post=30378"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}