{"id":17493,"date":"2024-06-04T20:23:08","date_gmt":"2024-06-05T03:23:08","guid":{"rendered":"https:\/\/www.pingcap.com\/?post_type=article&#038;p=17493"},"modified":"2024-06-04T20:28:37","modified_gmt":"2024-06-05T03:28:37","slug":"managing-vectors-in-the-same-way-as-operating-on-mysql-data","status":"publish","type":"article","link":"https:\/\/www.pingcap.com\/ko\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/","title":{"rendered":"Managing Vectors in the Same Way as Operating on MySQL Data"},"content":{"rendered":"\n<p>As the landscape of data management continues to evolve, the integration of vector search capabilities into traditional relational databases has emerged as a powerful advancement. <a href=\"https:\/\/tidb.cloud\/\">TiDB Serverless<\/a>, now offers a <a href=\"https:\/\/www.pingcap.com\/blog\/integrating-vector-search-into-tidb-for-ai-applications\/\">vector search feature<\/a>, allowing users to manage vectors in a manner akin to operating on MySQL data. This seamless integration of vector search within a relational database framework not only simplifies complex data operations but also extends the versatility of database applications.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Understanding_Vector_Search\"><\/span>Understanding Vector Search<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Vector search enables semantic and similarity searches across various data types, such as text, images, videos, and audio. Instead of searching the data itself, vector search focuses on the meanings of the data. This is achieved by representing data as points in a multidimensional space, where the spatial relationships between points indicate semantic similarities.<\/p>\n\n\n\n<p>In TiDB, vector embeddings are used to represent data. These embeddings can be stored alongside traditional data within the same database. This unique feature allows users to perform sophisticated searches and analyses while leveraging the robustness and scalability of a relational database.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Implementing_Vector_Search_in_TiDB\"><\/span>Implementing Vector Search in TiDB<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Setting Up TiDB for Vector Search<\/h3>\n\n\n\n<p>To get started with TiDB&#8217;s vector search feature, follow these steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li><strong>Sign Up for TiDB Cloud<\/strong>: Create an account on <a href=\"https:\/\/tidb.cloud\/ai\">TiDB Cloud<\/a> and sign up for the service.<\/li>\n\n\n\n<li><strong>Select the Appropriate Region<\/strong>: As of now, vector search is available only in the <code>eu-central-1<\/code> region. Select this region when setting up your TiDB serverless cluster.<\/li>\n\n\n\n<li><strong>Create a TiDB Serverless Cluster<\/strong>: Follow the tutorial on <a href=\"https:\/\/docs.google.com\/document\/d\/16zJADoPfmLNHP5tFqREOCslwmPUGhMIqXTL4QQ6vf2g\/edit#heading=h.4t6z6fprc37x\">TiDB Cloud<\/a> to create a cluster with vector search support enabled.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Basic Usage: Insert and Query Vectors<\/h3>\n\n\n\n<p>Once your cluster is set up, you can start by creating tables and inserting data that includes vector embeddings. Here is an example of how to create a table with a 3-dimensional vector field and insert records:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE vector_table (\n    id INT PRIMARY KEY, \n    doc TEXT, \n    embedding VECTOR(3)\n);\n\nINSERT INTO vector_table VALUES \n    (1, 'apple', '&#91;1,1,1]'),\n    (2, 'banana', '&#91;1,1,2]'),\n    (3, 'dog', '&#91;2,2,2]');<\/code><\/pre>\n\n\n\n<p>You can query the table to retrieve all records:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT * FROM vector_table;<\/code><\/pre>\n\n\n\n<p>To find the nearest neighbors to a given vector based on cosine distance, you can execute the following query:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT * FROM vector_table \nORDER BY vec_cosine_distance(embedding, '&#91;1,1,3]') \nLIMIT 3;<\/code><\/pre>\n\n\n\n<p>This query orders the results by their cosine similarity to the vector <code>[1,1,3]<\/code>, returning the closest matches.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Advanced Vector Operations<\/h3>\n\n\n\n<p>TiDB also supports various vector distance functions, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>Vec_L1_Distance<\/code>: Manhattan Distance<\/li>\n\n\n\n<li><code>Vec_L2_Distance<\/code>: Squared Euclidean Distance<\/li>\n\n\n\n<li><code>Vec_Cosine_Distance<\/code>: Cosine Distance<\/li>\n\n\n\n<li><code>Vec_Negative_Inner_Product<\/code>: Negative Inner Product<\/li>\n<\/ul>\n\n\n\n<p>You can use these functions in your SQL queries to perform different types of vector comparisons. For example, to compute the cosine distance between two vectors:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT vec_cosine_distance('&#91;1,1,1]', '&#91;1,2,3]');<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Indexing for Faster Vector Searches<\/h3>\n\n\n\n<p>To optimize vector search performance, you can create an HNSW (Hierarchical Navigable Small World) index on your vector fields. This index type is suitable for vector search queries and can be defined during table creation:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE vector_table_with_index (\n    id INT PRIMARY KEY, \n    doc TEXT, \n    embedding VECTOR(3) COMMENT 'hnsw(distance=cosine)'\n);<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Integration with AI Frameworks<\/h3>\n\n\n\n<p>TiDB vector search integrates seamlessly with popular AI frameworks, such as LangChain and LlamaIndex. These integrations enable advanced applications like semantic search using OpenAI embeddings. For example, you can set up a semantic search environment with the following steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li><strong>Prepare Your Environment<\/strong>: Set up a virtual environment and install the necessary dependencies.<\/li>\n\n\n\n<li><strong>Define Your Data Model<\/strong>: Create a data model using the <code>peewee<\/code> ORM with vector support.<\/li>\n\n\n\n<li><strong>Generate and Insert Embeddings<\/strong>: Use OpenAI&#8217;s API to generate embeddings for your data and insert them into your TiDB table.<\/li>\n\n\n\n<li><strong>Query for Similar Documents<\/strong>: Execute queries to find documents semantically similar to a given query.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>import os\nfrom openai import OpenAI\nfrom peewee import Model, MySQLDatabase, TextField, SQL\nfrom tidb_vector.peewee import VectorField\n\n# Set up OpenAI client and database connection\nclient = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))\ndb = MySQLDatabase('test', user=os.getenv('TIDB_USERNAME'), password=os.getenv('TIDB_PASSWORD'), host=os.getenv('TIDB_HOST'), port=4000, ssl_verify_cert=True, ssl_verify_identity=True)\n\n# Define the data model\nclass DocModel(Model):\n    text = TextField()\n    embedding = VectorField(dimensions=1536)\n\n    class Meta:\n        database = db\n        table_name = \"doc_test\"\n\n# Connect to the database and create the table\ndb.connect()\ndb.create_tables(&#91;DocModel])\n\n# Generate embeddings and insert into the database\ndocuments = &#91;\"Example document 1\", \"Example document 2\", \"Example document 3\"]\nembeddings = &#91;r.embedding for r in client.embeddings.create(input=documents, model=\"text-embedding-3-small\").data]\ndata_source = &#91;{\"text\": doc, \"embedding\": emb} for doc, emb in zip(documents, embeddings)]\nDocModel.insert_many(data_source).execute()\n\n# Query for similar documents\nquestion = \"Example query\"\nquestion_embedding = client.embeddings.create(input=question, model=\"text-embedding-3-small\").data&#91;0].embedding\nrelated_docs = DocModel.select(DocModel.text, DocModel.embedding.cosine_distance(question_embedding).alias(\"distance\")).order_by(SQL(\"distance\")).limit(3)\n\n# Output the results\nfor doc in related_docs:\n    print(doc.distance, doc.text)\n\ndb.close()<\/code><\/pre>\n\n\n\n<p>This approach demonstrates how TiDB&#8217;s vector search capability can be integrated into AI applications, leveraging the power of embeddings and similarity searches to deliver more intelligent and context-aware solutions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>TiDB&#8217;s vector search feature bridges the gap between traditional relational databases and modern AI-driven applications. By allowing users to manage vectors in the same way they operate on MySQL data, TiDB offers a flexible and powerful platform for developing innovative solutions that require both robust data management and advanced search capabilities. Whether you are building semantic search engines, recommendation systems, or any application that benefits from understanding the meaning behind data, TiDB&#8217;s vector search provides the tools you need to succeed.<\/p>\n\n\n\n<p>For more information and to start using TiDB vector search, visit <a href=\"https:\/\/tidb.cloud\/ai\">TiDB Cloud<\/a> and join the waitlist to get access to this exciting feature.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">More Demos<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/openai_embedding\/README.md\">OpenAI Embedding<\/a>: use the OpenAI embedding model to generate vectors for text data.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/image_search\/README.md\">Image Search<\/a>: use the OpenAI CLIP model to generate vectors for image and text.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/llamaindex-tidb-vector-with-ui\/README.md\">LlamaIndex RAG with UI<\/a>: use the LlamaIndex to build an <a href=\"https:\/\/docs.llamaindex.ai\/en\/latest\/getting_started\/concepts\/\">RAG(Retrieval-Augmented Generation)<\/a> application.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/llamaindex-tidb-vector\/README.md\">Chat with URL<\/a>: use LlamaIndex to build an <a href=\"https:\/\/docs.llamaindex.ai\/en\/latest\/getting_started\/concepts\/\">RAG(Retrieval-Augmented Generation)<\/a> application that can chat with a URL.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/graphrag-demo\/README.md\">GraphRAG<\/a>: 20 lines code of using TiDB Serverless to build a Knowledge Graph based RAG application.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/graphrag-step-by-step-tutorial\/README.md\">GraphRAG Step by Step Tutorial<\/a>: Step by step tutorial to build a Knowledge Graph based RAG application with Colab notebook. In this tutorial, you will learn how to extract knowledge from a text corpus, build a Knowledge Graph, store the Knowledge Graph in TiDB Serverless, and search from the Knowledge Graph.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>As the landscape of data management continues to evolve, the integration of vector search capabilities into traditional relational databases has emerged as a powerful advancement. TiDB Serverless, now offers a vector search feature, allowing users to manage vectors in a manner akin to operating on MySQL data. This seamless integration of vector search within a [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"template":"","class_list":["post-17493","article","type-article","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Managing Vectors in the Same Way as Operating on MySQL Data<\/title>\n<meta name=\"description\" content=\"In this blog, we show demos how to manage vector data like that in MySQL with SQL grammar, without new skills.\" \/>\n<meta name=\"robots\" content=\"noindex, follow\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Managing Vectors in the Same Way as Operating on MySQL Data\" \/>\n<meta property=\"og:description\" content=\"In this blog, we show demos how to manage vector data like that in MySQL with SQL grammar, without new skills.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-05T03:28:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1440\" \/>\n\t<meta property=\"og:image:height\" content=\"714\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/\",\"url\":\"https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/\",\"name\":\"Managing Vectors in the Same Way as Operating on MySQL Data\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"datePublished\":\"2024-06-05T03:23:08+00:00\",\"dateModified\":\"2024-06-05T03:28:37+00:00\",\"description\":\"In this blog, we show demos how to manage vector data like that in MySQL with SQL grammar, without new skills.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Articles\",\"item\":\"https:\/\/www.pingcap.com\/article\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Managing Vectors in the Same Way as Operating on MySQL Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Managing Vectors in the Same Way as Operating on MySQL Data","description":"In this blog, we show demos how to manage vector data like that in MySQL with SQL grammar, without new skills.","robots":{"index":"noindex","follow":"follow"},"og_locale":"ko_KR","og_type":"article","og_title":"Managing Vectors in the Same Way as Operating on MySQL Data","og_description":"In this blog, we show demos how to manage vector data like that in MySQL with SQL grammar, without new skills.","og_url":"https:\/\/www.pingcap.com\/ko\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","article_modified_time":"2024-06-05T03:28:37+00:00","og_image":[{"width":1440,"height":714,"url":"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@PingCAP","twitter_misc":{"Est. reading time":"5\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/","url":"https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/","name":"Managing Vectors in the Same Way as Operating on MySQL Data","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"datePublished":"2024-06-05T03:23:08+00:00","dateModified":"2024-06-05T03:28:37+00:00","description":"In this blog, we show demos how to manage vector data like that in MySQL with SQL grammar, without new skills.","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"Articles","item":"https:\/\/www.pingcap.com\/article\/"},{"@type":"ListItem","position":3,"name":"Managing Vectors in the Same Way as Operating on MySQL Data"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]}]}},"card_markup":"        <a class=\"card-article\" href=\"https:\/\/www.pingcap.com\/ko\/article\/managing-vectors-in-the-same-way-as-operating-on-mysql-data\/\">            <h3>Managing Vectors in the Same Way as Operating on MySQL Data<\/h3>            <p>As the landscape of data management continues to evolve, the integration of vector search capabilities into traditional relational databases has emerged as a powerful advancement. TiDB Serverless, now offers a vector search feature, allowing users to manage vectors in a manner akin to operating on MySQL data. This seamless integration of vector search within a [&hellip;]<\/p>        <\/a>","_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/article\/17493","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/article"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/article"}],"author":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/users\/8"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=17493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}