{"id":17026,"date":"2024-05-21T22:15:22","date_gmt":"2024-05-22T05:15:22","guid":{"rendered":"https:\/\/www.pingcap.com\/?post_type=article&#038;p=17026"},"modified":"2024-06-03T01:23:17","modified_gmt":"2024-06-03T08:23:17","slug":"building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database","status":"publish","type":"article","link":"https:\/\/www.pingcap.com\/ko\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/","title":{"rendered":"Building a Retrieval-Augmented Generation (RAG) Application with LlamaIndex and MySQL-Compatible Database TiDB Serverless"},"content":{"rendered":"<p>In the age of Generative AI (GenAI) and Large Language Models (LLMs), MySQL users have faced challenges in building AI applications such as Retrieval-Augmented Generation (RAG) apps due to the lack of native vector storage, essential vector functions, performance issues, and scalability limits. However, with <a href=\"https:\/\/tidb.cloud\/\">TiDB Serverless<\/a>, a MySQL-compatible database that includes <a href=\"https:\/\/www.pingcap.com\/ko\/blog\/integrating-vector-search-into-tidb-for-ai-applications\/\">vector storage<\/a>, these challenges can be overcome. This tutorial will guide you through building a RAG app using LlamaIndex and TiDB Serverless, Which is a MySQL Compatible database and its built-in Vector Storage.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Prerequisites\"><\/span>Prerequisites<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>To get started, ensure you have the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A running TiDB Serverless cluster with vector search enabled<\/li>\n\n\n\n<li>Python 3.8 or later<\/li>\n\n\n\n<li>OpenAI API key<\/li>\n<\/ul>\n\n\n\n<p class=\"has-medium-font-size\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\"><strong>\ud83d\udc49 Try the most advanced MySQL vector solution with TiDB Serverless. <\/strong><\/mark><\/p>\n\n\n\n<p><a href=\"https:\/\/tidb.cloud\/ai\/\" class=\"button\" target=\"_blank\" data-gtag=\"event:go_to_lead_form_page,product_type:serverless,button_name:Join the Waitlist,position:blog_middle\" rel=\"noopener\">Join the Waitlist<\/a><\/p>\n&nbsp;\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Step-by-Step_Guide\"><\/span>Step-by-Step Guide<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h5 class=\"wp-block-heading\">1. Clone the Repository<\/h5>\n\n\n\n<p>First, clone the example repository from GitHub:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>git clone https:\/\/github.com\/pingcap\/tidb-vector-python.git<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">2. Set Up a Virtual Environment<\/h5>\n\n\n\n<p>Navigate to the project directory and set up a virtual environment:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cd tidb-vector-python\/examples\/llamaindex-tidb-vector\npython3 -m venv .venv\nsource .venv\/bin\/activate<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">3. Install Dependencies<\/h5>\n\n\n\n<p>Install the required dependencies:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install -r requirements.txt<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">4. Set Environment Variables<\/h5>\n\n\n\n<p>Set the necessary environment variables for OpenAI and TiDB:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>export OPENAI_API_KEY=\"your_openai_api_key\"\nexport TIDB_HOST=\"your_tidb_host\"\nexport TIDB_USERNAME=\"your_tidb_username\"\nexport TIDB_PASSWORD=\"your_tidb_password\"<\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\">5. Run the Example<\/h5>\n\n\n\n<p>To see the application in action, run the provided script:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python chat_with_url.py --url \"https:\/\/docs.pingcap.com\/tidb\/stable\/overview\"<\/code><\/pre>\n\n\n\n<p>This script allows you to interact with a specified URL and ask questions based on the content.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Example_Code_Breakdown\"><\/span>Example Code Breakdown<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Here\u2019s a detailed look at the example code:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import os\nimport click\nfrom sqlalchemy import URL\nfrom llama_index.core import VectorStoreIndex, StorageContext\nfrom llama_index.vector_stores.tidbvector import TiDBVectorStore\nfrom llama_index.readers.web import SimpleWebPageReader\n\n# Configure the connection URL\ntidb_connection_url = URL(\n    \"mysql+pymysql\",\n    username=os.environ&#91;'TIDB_USERNAME'],\n    password=os.environ&#91;'TIDB_PASSWORD'],\n    host=os.environ&#91;'TIDB_HOST'],\n    port=4000,\n    database=\"test\",\n    query={\"ssl_verify_cert\": True, \"ssl_verify_identity\": True},\n)\n\n# Set up the vector store\ntidbvec = TiDBVectorStore(\n    connection_string=tidb_connection_url,\n    table_name=\"llama_index_rag_test\",\n    distance_strategy=\"cosine\",\n    vector_dimension=1536,  # The dimension is decided by the model\n    drop_existing_table=False,\n)\n\ntidb_vec_index = VectorStoreIndex.from_vector_store(tidbvec)\nstorage_context = StorageContext.from_defaults(vector_store=tidbvec)\nquery_engine = tidb_vec_index.as_query_engine(streaming=True)\n\ndef do_prepare_data(url):\n    documents = SimpleWebPageReader(html_to_text=True).load_data(&#91;url])\n    tidb_vec_index.from_documents(documents, storage_context=storage_context, show_progress=True)\n\n_default_url = 'https:\/\/docs.pingcap.com\/tidb\/stable\/overview'\n\n@click.command()\n@click.option('--url', default=_default_url, help=f'URL you want to talk to, default={_default_url}')\ndef chat_with_url(url):\n    do_prepare_data(url)\n    while True:\n        question = click.prompt(\"Enter your question\")\n        response = query_engine.query(question)\n        click.echo(response)\n\nif __name__ == '__main__':\n    chat_with_url()<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Expanding More AI Application Examples with TiDB Serverless Vector Storage<\/strong><\/h4>\n\n\n\n<p>Unlocking the potential of TiDB Serverless vector storage extends beyond traditional data management, offering a gateway to innovative AI applications. Here are some concise examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/openai_embedding\/README.md\">OpenAI Embedding<\/a>: use the OpenAI embedding model to generate vectors for text data.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/image_search\/README.md\">Image Search<\/a>: use the OpenAI CLIP model to generate vectors for image and text.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/llamaindex-tidb-vector-with-ui\/README.md\">LlamaIndex RAG with UI<\/a>: use the LlamaIndex to build an <a href=\"https:\/\/docs.llamaindex.ai\/en\/latest\/getting_started\/concepts\/\">RAG(Retrieval-Augmented Generation)<\/a> application.<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/pingcap\/tidb-vector-python\/blob\/main\/examples\/llamaindex-tidb-vector\/README.md\">Chat with URL<\/a>: use LlamaIndex to build an <a href=\"https:\/\/docs.llamaindex.ai\/en\/latest\/getting_started\/concepts\/\">RAG(Retrieval-Augmented Generation)<\/a> application that can chat with a URL.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Using TiDB Serverless with built-in vector storage support enables you to build powerful AI applications leveraging semantic and similarity search capabilities. This tutorial covered the setup of TiDB Serverless, creating tables with vector fields, and performing vector search operations. For more detailed documentation and advanced features, refer to the <a href=\"https:\/\/tidb.cloud\/ai\">TiDB Vector Search Documentation<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Additional Resources<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/discord.gg\/zcqexutz2R\">Join the TiDB community on Discord<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/tidb.support.pingcap.com\/\">TiDB Support Portal<\/a><\/li>\n<\/ul>\n\n\n\n<p>Feel free to reach out on Discord or the TiDB Support Portal for assistance. Happy coding!<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\"><strong>\ud83d\udc49 Build your own RAG application with LlamaIndex and MySQL-Compatible Database TiDB Serverless. <\/strong><\/mark><\/p>\n\n\n\n<p><a href=\"https:\/\/tidb.cloud\/ai\/\" class=\"button\" target=\"_blank\" data-gtag=\"event:go_to_lead_form_page,product_type:serverless,button_name:Join the Waitlist,position:blog_bottom\" rel=\"noopener\">Join the Waitlist<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>In the age of Generative AI (GenAI) and Large Language Models (LLMs), MySQL users have faced challenges in building AI applications such as Retrieval-Augmented Generation (RAG) apps due to the lack of native vector storage, essential vector functions, performance issues, and scalability limits. However, with TiDB Serverless, a MySQL-compatible database that includes vector storage, these [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":0,"template":"","class_list":["post-17026","article","type-article","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Building a RAG Application with LlamaIndex and MySQL Database<\/title>\n<meta name=\"description\" content=\"This tutorial will guide you through building a RAG app using LlamaIndex and TiDB Serverless with its built-in Vector Storage.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pingcap.com\/ko\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a RAG Application with LlamaIndex and MySQL Database\" \/>\n<meta property=\"og:description\" content=\"This tutorial will guide you through building a RAG app using LlamaIndex and TiDB Serverless with its built-in Vector Storage.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-03T08:23:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1440\" \/>\n\t<meta property=\"og:image:height\" content=\"714\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/\",\"url\":\"https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/\",\"name\":\"Building a RAG Application with LlamaIndex and MySQL Database\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"datePublished\":\"2024-05-22T05:15:22+00:00\",\"dateModified\":\"2024-06-03T08:23:17+00:00\",\"description\":\"This tutorial will guide you through building a RAG app using LlamaIndex and TiDB Serverless with its built-in Vector Storage.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Articles\",\"item\":\"https:\/\/www.pingcap.com\/article\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Building a Retrieval-Augmented Generation (RAG) Application with LlamaIndex and MySQL-Compatible Database TiDB Serverless\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building a RAG Application with LlamaIndex and MySQL Database","description":"This tutorial will guide you through building a RAG app using LlamaIndex and TiDB Serverless with its built-in Vector Storage.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pingcap.com\/ko\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/","og_locale":"ko_KR","og_type":"article","og_title":"Building a RAG Application with LlamaIndex and MySQL Database","og_description":"This tutorial will guide you through building a RAG app using LlamaIndex and TiDB Serverless with its built-in Vector Storage.","og_url":"https:\/\/www.pingcap.com\/ko\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","article_modified_time":"2024-06-03T08:23:17+00:00","og_image":[{"width":1440,"height":714,"url":"https:\/\/static.pingcap.com\/files\/2024\/09\/11005522\/Homepage-Ad.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@PingCAP","twitter_misc":{"Est. reading time":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/","url":"https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/","name":"Building a RAG Application with LlamaIndex and MySQL Database","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"datePublished":"2024-05-22T05:15:22+00:00","dateModified":"2024-06-03T08:23:17+00:00","description":"This tutorial will guide you through building a RAG app using LlamaIndex and TiDB Serverless with its built-in Vector Storage.","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"Articles","item":"https:\/\/www.pingcap.com\/article\/"},{"@type":"ListItem","position":3,"name":"Building a Retrieval-Augmented Generation (RAG) Application with LlamaIndex and MySQL-Compatible Database TiDB Serverless"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]}]}},"card_markup":"        <a class=\"card-article\" href=\"https:\/\/www.pingcap.com\/ko\/article\/building-a-retrieval-augmented-generation-application-with-llamaindex-and-mysql-compatible-database\/\">            <h3>Building a Retrieval-Augmented Generation (RAG) Application with LlamaIndex and MySQL-Compatible Database TiDB Serverless<\/h3>            <p>In the age of Generative AI (GenAI) and Large Language Models (LLMs), MySQL users have faced challenges in building AI applications such as Retrieval-Augmented Generation (RAG) apps due to the lack of native vector storage, essential vector functions, performance issues, and scalability limits. However, with TiDB Serverless, a MySQL-compatible database that includes vector storage, these [&hellip;]<\/p>        <\/a>","_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/article\/17026","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/article"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/article"}],"author":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/users\/8"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=17026"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}