{"id":17399,"date":"2024-05-30T02:45:24","date_gmt":"2024-05-30T09:45:24","guid":{"rendered":"https:\/\/www.pingcap.com\/?post_type=article&#038;p=17399"},"modified":"2024-06-03T01:17:20","modified_gmt":"2024-06-03T08:17:20","slug":"vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless","status":"publish","type":"article","link":"https:\/\/www.pingcap.com\/ko\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/","title":{"rendered":"Vector Database Scalability: A Comparative Analysis of pgvector and TiDB Serverless Vector Storage"},"content":{"rendered":"<p>In the era of artificial intelligence (AI) and big data, vector databases represent a significant evolution in database technology. These databases, designed to efficiently store and query high-dimensional vector data, are crucial for AI applications such as semantic search, recommendation systems, and similarity searches. Among the notable technologies in this space are <a href=\"https:\/\/github.com\/pgvector\/pgvector\">pgvector<\/a> \uadf8\ub9ac\uace0 <a href=\"https:\/\/tidb.cloud\/ai\">TiDB Serverless Vector Storage<\/a>, each offering scalable solutions for handling complex vector-based data queries. This article provides a comparative analysis of the scalability of these two vector databases, guiding enthusiasts and professionals in selecting the appropriate technology for their needs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Understanding_Vector_Databases\"><\/span>Understanding Vector Databases<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Vector databases specialize in storing and managing vector data, which represents items in high-dimensional space. These vectors, often derived from unstructured data such as images, text, or videos through machine learning models, capture the essence of the data in a format efficiently processed by the database. The core functionality of a vector database hinges on its ability to perform fast nearest neighbor searches, identifying the closest vectors to a given query vector, facilitating tasks like similarity searches or recommendations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Scalability Challenge<\/h3>\n\n\n\n<p>With exploding data volumes and increasing query complexities, scalability is a paramount concern for vector databases. Scalability refers to the database&#8217;s ability to handle growing amounts of data and an increasing number of queries without a proportional increase in latency or resource consumption. It includes the ability to scale up\u2014increasing resources for a single system\u2014and scale out, distributing data and queries across multiple machines.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"pgvector_Leveraging_PostgreSQL_for_Vector_Data\"><\/span>pgvector: Leveraging PostgreSQL for Vector Data<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>pgvector is an extension for PostgreSQL, one of the most popular open-source relational database systems. It introduces vector data types and indexing capabilities into PostgreSQL, allowing it to store and query high-dimensional vectors. The scalability of pgvector is inherently tied to that of PostgreSQL. While PostgreSQL excels in flexibility and features, its architecture poses certain limitations when scaling out.<\/p>\n\n\n\n<p>pgvector benefits from PostgreSQL&#8217;s robustness and wide array of features but inherits its challenges in horizontal scalability. Traditionally, scaling PostgreSQL involves increasing hardware resources (scale-up) or implementing sharding manually (scale-out), which can be complex and may not evenly distribute the workload across shards.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"TiDB_Serverless_Vector_Storage_A_Distributed_Approach\"><\/span>TiDB Serverless Vector Storage: A Distributed Approach<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>TiDB Serverless Vector Storage is an integral component of the TiDB ecosystem, a distributed SQL database system that is MySQL-compatible and open-source. Known for its horizontal scalability, TiDB&#8217;s architecture naturally accommodates scale-out strategies by distributing data and computation across multiple nodes in a cluster. This design philosophy extends to TiDB Serverless <a href=\"https:\/\/www.pingcap.com\/ko\/blog\/integrating-vector-search-into-tidb-for-ai-applications\/\">Vector Storage<\/a>, providing inherent advantages in handling massive volumes of vector data and efficiently distributing vector query processing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Key Scalability Features of TiDB Serverless Vector Storage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Horizontal Scaling<\/strong>: TiDB Serverless Vector Storage scales out seamlessly by adding more nodes to the cluster, automatically rebalancing vector data among the nodes to optimize query performance.<\/li>\n\n\n\n<li><strong>Distributed Query Processing<\/strong>: Queries on vector data are intelligently split and concurrently processed across multiple nodes, leveraging TiDB&#8217;s distributed computing capabilities to reduce query latency.<\/li>\n\n\n\n<li><strong>Resource Isolation<\/strong>: By decoupling storage and compute resources, TiDB Serverless Vector Storage ensures that intensive vector query processing does not impact the overall performance of transactional workloads on the same TiDB cluster.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Comparative_Analysis\"><\/span>Comparative Analysis<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>When comparing pgvector and TiDB Serverless Vector Storage in terms of scalability, several factors stand out:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ease of Scaling<\/strong>: TiDB Serverless Vector Storage offers a more straightforward path to scale out, as its underlying distributed architecture is designed for this purpose from the ground up. Conversely, scaling pgvector involves PostgreSQL&#8217;s more traditional and sometimes cumbersome scaling techniques.<\/li>\n\n\n\n<li><strong>Query Processing<\/strong>: TiDB Serverless Vector Storage&#8217;s distributed query processing can significantly reduce query latencies for vector data, especially in large-scale scenarios. pgvector, limited by PostgreSQL&#8217;s single-node query execution model, might not achieve the same level of efficiency in processing complex vector queries across large datasets.<\/li>\n\n\n\n<li><strong>Data and Workload Management<\/strong>: The ability to automatically rebalance data and efficiently manage mixed workloads (transactional and analytical) gives TiDB Serverless Vector Storage an edge in dynamic environments where data volumes and query patterns fluctuate.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>While both pgvector and TiDB Serverless Vector Storage present compelling solutions for integrating vector data capabilities into relational database systems, their scalability characteristics differ markedly due to their underlying architectures. For applications that require robust horizontal scaling and can benefit from distributed query processing\u2014particularly those in the realms of AI and big data\u2014TiDB Serverless Vector Storage emerges as a highly scalable and efficient choice. Meanwhile, pgvector offers a valuable extension to PostgreSQL users, enabling vector data management within a familiar ecosystem, albeit with certain scalability limitations inherent to traditional relational databases.<\/p>\n\n\n\n<p>In navigating the choice between pgvector and TiDB Serverless Vector Storage, organizations and developers must consider their specific scalability requirements, existing technological stack, and the strategic importance of vector data processing to their operations. Adopting the right vector database technology is a crucial.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"One_more_thing\"><\/span>One more thing<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>You can start a free TiDB Serverless Vector Storage from <a href=\"https:\/\/tidb.cloud\/ai\">\uc5ec\uae30<\/a> with some examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/openai_embedding\/README.md\">OpenAI Embedding<\/a>: use the OpenAI embedding model to generate vectors for text data.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/image_search\/README.md\">Image Search<\/a>: use the OpenAI CLIP model to generate vectors for image and text.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/llamaindex-tidb-vector-with-ui\/README.md\">LlamaIndex RAG with UI<\/a>: use the LlamaIndex to build an\u00a0<a href=\"https:\/\/docs.llamaindex.ai\/en\/latest\/getting_started\/concepts\/\">RAG(Retrieval-Augmented Generation)<\/a>\u00a0application.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/llamaindex-tidb-vector\/README.md\">Chat with URL<\/a>: use LlamaIndex to build an\u00a0<a href=\"https:\/\/docs.llamaindex.ai\/en\/latest\/getting_started\/concepts\/\">RAG(Retrieval-Augmented Generation)<\/a>\u00a0application that can chat with a URL.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/graphrag-demo\/README.md\">GraphRAG<\/a>: 20 lines code of using TiDB Serverless to build a Knowledge Graph based RAG application.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/graphrag-step-by-step-tutorial\/README.md\">GraphRAG Step by Step Tutorial<\/a>: Step by step tutorial to build a Knowledge Graph based RAG application with Colab notebook. In this tutorial, you will learn how to extract knowledge from a text corpus, build a Knowledge Graph, store the Knowledge Graph in TiDB Serverless, and search from the Knowledge Graph.<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>In the era of artificial intelligence (AI) and big data, vector databases represent a significant evolution in database technology. These databases, designed to efficiently store and query high-dimensional vector data, are crucial for AI applications such as semantic search, recommendation systems, and similarity searches. Among the notable technologies in this space are pgvector and TiDB [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"template":"","class_list":["post-17399","article","type-article","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Vector Database Scalability: pgvector vs. TiDB Vector Storage<\/title>\n<meta name=\"description\" content=\"Choose between pgvector and TiDB Vector Storage by evaluating scalability, tech stack, and the strategic value of vector data processing.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pingcap.com\/ko\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Vector Database Scalability: pgvector vs. TiDB Vector Storage\" \/>\n<meta property=\"og:description\" content=\"Choose between pgvector and TiDB Vector Storage by evaluating scalability, tech stack, and the strategic value of vector data processing.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-03T08:17:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1440\" \/>\n\t<meta property=\"og:image:height\" content=\"714\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/\",\"url\":\"https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/\",\"name\":\"Vector Database Scalability: pgvector vs. TiDB Vector Storage\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"datePublished\":\"2024-05-30T09:45:24+00:00\",\"dateModified\":\"2024-06-03T08:17:20+00:00\",\"description\":\"Choose between pgvector and TiDB Vector Storage by evaluating scalability, tech stack, and the strategic value of vector data processing.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Articles\",\"item\":\"https:\/\/www.pingcap.com\/article\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Vector Database Scalability: A Comparative Analysis of pgvector and TiDB Serverless Vector Storage\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Vector Database Scalability: pgvector vs. TiDB Vector Storage","description":"Choose between pgvector and TiDB Vector Storage by evaluating scalability, tech stack, and the strategic value of vector data processing.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pingcap.com\/ko\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/","og_locale":"ko_KR","og_type":"article","og_title":"Vector Database Scalability: pgvector vs. TiDB Vector Storage","og_description":"Choose between pgvector and TiDB Vector Storage by evaluating scalability, tech stack, and the strategic value of vector data processing.","og_url":"https:\/\/www.pingcap.com\/ko\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","article_modified_time":"2024-06-03T08:17:20+00:00","og_image":[{"width":1440,"height":714,"url":"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@PingCAP","twitter_misc":{"Est. reading time":"5\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/","url":"https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/","name":"Vector Database Scalability: pgvector vs. TiDB Vector Storage","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"datePublished":"2024-05-30T09:45:24+00:00","dateModified":"2024-06-03T08:17:20+00:00","description":"Choose between pgvector and TiDB Vector Storage by evaluating scalability, tech stack, and the strategic value of vector data processing.","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"Articles","item":"https:\/\/www.pingcap.com\/article\/"},{"@type":"ListItem","position":3,"name":"Vector Database Scalability: A Comparative Analysis of pgvector and TiDB Serverless Vector Storage"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]}]}},"card_markup":"        <a class=\"card-article\" href=\"https:\/\/www.pingcap.com\/ko\/article\/vector-database-scalability-a-comparative-analysis-of-pgvector-and-tidb-serverless\/\">            <h3>Vector Database Scalability: A Comparative Analysis of pgvector and TiDB Serverless Vector Storage<\/h3>            <p>In the era of artificial intelligence (AI) and big data, vector databases represent a significant evolution in database technology. These databases, designed to efficiently store and query high-dimensional vector data, are crucial for AI applications such as semantic search, recommendation systems, and similarity searches. Among the notable technologies in this space are pgvector and TiDB [&hellip;]<\/p>        <\/a>","_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/article\/17399","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/article"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/article"}],"author":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/users\/8"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=17399"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}