{"id":7642,"date":"2022-07-18T03:44:59","date_gmt":"2022-07-18T10:44:59","guid":{"rendered":"https:\/\/en.pingcap.com\/?p=7642"},"modified":"2024-12-20T04:04:48","modified_gmt":"2024-12-20T12:04:48","slug":"the-long-expedition-toward-making-a-real-time-htap-database","status":"publish","type":"post","link":"https:\/\/www.pingcap.com\/ko\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/","title":{"rendered":"The Long Expedition toward Making a Real-Time HTAP Database\u00a0"},"content":{"rendered":"\n<p>Recently, Hybrid Transactional and Analytical Processing (HTAP) has become a hot topic as large players like Google (AlloyDB), Snowflake (Unistore) and Oracle (HeatWave) have joined the game. But still, many people don\u2019t know what makes an HTAP database.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.pingcap.com\/tidb\/\">TiDB<\/a> is an open-source, distributed HTAP database. Thousands of companies in different industries have benefited from TiDB\u2019s HTAP abilities over the years. However, building an HTAP database is quite a long expedition,&nbsp;and we are still on our way.&nbsp;<\/p>\n\n\n\n<p>I am one of the designers and developers of TiDB\u2019s HTAP architecture. In this post&nbsp;I\u2019ll share some stories behind our HTAP design decisions and how we learned from our customers and built an HTAP database trusted and recognized both by customers, researchers, and developers.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Where_the_HTAP_dream_begins\"><\/span>Where the HTAP dream begins&nbsp;<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Earlier TiDB versions, mainly those before TiDB 2.0, were designed to be strong Online Transactional Processing (OLTP) databases with horizontal scalability, high availability, strong consistency, and MySQL compatibility.&nbsp;<\/p>\n\n\n\n<p>The figure below shows TiDB\u2019s original architecture with three key components: the TiDB server, the Placement Driver (PD), and the TiKV server. The architecture did not include later features such as a columnar engine, massively parallel processing (MPP) architecture, and vectorization.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"703\" height=\"519\" src=\"https:\/\/www.pingcap.com\/core\/uploads\/2022\/07\/dd2334724b46b1fb096244a7e1347c60.jpeg\" alt=\"\" class=\"wp-image-7645\" style=\"width:596px;height:440px\" srcset=\"https:\/\/static.pingcap.com\/files\/2022\/07\/dd2334724b46b1fb096244a7e1347c60.jpeg 703w, https:\/\/static.pingcap.com\/files\/2022\/07\/dd2334724b46b1fb096244a7e1347c60-300x221.jpeg 300w\" sizes=\"auto, (max-width: 703px) 100vw, 703px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"has-text-align-center\"><em>TiDB\u2019s original architecture<\/em><\/p>\n\n\n\n<p>PD was the brain of the whole database system, which scheduled the storage nodes and kept their workloads balanced. TiKV was the storage engine which stored data in a distributed way and used a row format with cross-node ACID transaction support. The TiDB server was the stateless SQL compute engine with a classic volcano execution design.&nbsp;<\/p>\n\n\n\n<p>When we tried to get early adopters, we found most users were hesitant to use a brand new database in their mission critical transactional cases. Instead, they were inclined to use TiDB as their backup database for analytical workloads. Conversations like below happened many times:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>&#8220;<em><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-black-color\">This is an edgy distributed design \u2026 I bet you don&#8217;t want to miss the amazing experience with no sharding<\/mark>.<\/em>&#8221; We tried to persuade our potential customers to adopt TiDB.&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li>&#8220;<em>Hmm,\u201d they replied.&nbsp; \u201cSounds interesting. Can we use it as a read-only replica for analytical workloads first? Let&#8217;s see how it works.<\/em>&#8220;<\/li>\n<\/ul>\n\n\n\n<p><strong>We implemented the coprocessor framework to TiDB to speed up analytical queries.<\/strong> It allowed limited computation such as aggregations and filtering to be pushed down to TiKV nodes and executed in a distributed manner. This framework worked very well and gained us more TiDB adopters.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"922\" height=\"479\" src=\"https:\/\/www.pingcap.com\/core\/uploads\/2022\/07\/2e550ceb080877b984688af60c1f8a50.jpeg\" alt=\"\" class=\"wp-image-7646\" style=\"width:707px;height:367px\" srcset=\"https:\/\/static.pingcap.com\/files\/2022\/07\/2e550ceb080877b984688af60c1f8a50.jpeg 922w, https:\/\/static.pingcap.com\/files\/2022\/07\/2e550ceb080877b984688af60c1f8a50-300x156.jpeg 300w, https:\/\/static.pingcap.com\/files\/2022\/07\/2e550ceb080877b984688af60c1f8a50-768x399.jpeg 768w\" sizes=\"auto, (max-width: 922px) 100vw, 922px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"has-text-align-center\"><em>TiKV coprocessor&nbsp;<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Thank_you_Apache_Spark\"><\/span>Thank you, Apache Spark!<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>It was all good in the beginning, until we encountered more complicated analytical scenarios. Our customers told us that TiDB worked very well in OLTP scenarios, but was \u201ca bit slow\u201d in Online Analytical Processing (OLAP) scenarios, especially when they used TiDB to analyze a large volume of data and perform big JOINS. Also, TiDB did not work well with their big data ecosystem.&nbsp;<\/p>\n\n\n\n<p><strong>In short, the problem was TiDB\u2019s unmatched computation power with its scalable storage<\/strong>. <\/p>\n\n\n\n<p>TiDB&#8217;s storage system could scale, but the TiDB server, the computational component, could not. In OLTP scenarios, this problem could be fixed by adding multiple TiDB servers on top of TiKV. But in OLAP scenarios, each query could be very large. Since TiDB did not have an MPP architecture, TiDB servers could not share a single query workload. Operations like large JOINs became unacceptably slow. We urgently needed a scalable computation layer that could shuffle data around and work with the scalable storage layer together to deal with large queries.&nbsp;<\/p>\n\n\n\n<p>To fix this problem, we either needed to have our own MPP framework or we had to leverage an external engine. We only had a small team then,<strong> so in TiDB 3.0 we decided to leverage Apache Spark, a well-implemented and unified computation engine, and built a Spark plugin called TiSpark on top of TiKV.<\/strong> It included a TiKV client, a TiDB compatible type system, some coprocessor specific physical operators, and a plan rewriter. TiSpark partially converts the Spark SQL plan into the coprocessor plan, gathers results from TiKV, and finishes the computing in the native Spark engine. Thanks to Spark\u2019s flexible extension framework, all this was achieved without changing a single line of Spark code.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"574\" src=\"https:\/\/static.pingcap.com\/files\/2023\/04\/14010934\/image-70-1024x574.png\" alt=\"\" class=\"wp-image-11609\" srcset=\"https:\/\/static.pingcap.com\/files\/2023\/04\/14010934\/image-70-1024x574.png 1024w, https:\/\/static.pingcap.com\/files\/2023\/04\/14010934\/image-70-300x168.png 300w, https:\/\/static.pingcap.com\/files\/2023\/04\/14010934\/image-70-768x431.png 768w, https:\/\/static.pingcap.com\/files\/2023\/04\/14010934\/image-70-1440x808.png 1440w, https:\/\/static.pingcap.com\/files\/2023\/04\/14010934\/image-70.png 1530w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>TiSpark architecture<\/em><\/p>\n\n\n\n<p>TiSpark empowered TiDB in large scale analytical scenarios; but for small- to medium-sized queries, it didn\u2019t work very well. To fix this problem, <strong>we also improved TiDB\u2019s native compute engine. <\/strong>We changed TiDB\u2019s optimizer from rule-based to cost-based, and also improved the just-in-time (JIT) design.&nbsp;<\/p>\n\n\n\n<p>One more thing: <strong>TiSpark bridged the gap between TiDB and big data ecosystems. <\/strong>In many scenarios, TiDB serves as the warm data platform between the OLTP layer and data lakes due to its seamless integration of Spark.<\/p>\n\n\n\n<div class=\"trackable-btns\">\n  <a href=\"\/download\" onclick=\"trackViews('The Long Expedition toward Making a Real-Time HTAP Database', 'download-tidb-btn-middle')\"><button>Download TiDB<\/button><\/a>\n  <a href=\"https:\/\/share.hsforms.com\/1e2W03wLJQQKPd1d9rCbj_Q2npzm\" onclick=\"trackViews('The Long Expedition toward Making a Real-Time HTAP Database', 'subscribe-blog-btn-middle')\"><button>Subscribe to Blog<\/button><\/a>\n  <a href=\"\/contact-us\" onclick=\"trackViews('The Long Expedition toward Making a Real-Time HTAP Database', 'contact-us-middle')\"><button>Request a Demo<\/button><\/a>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"No_columnar_store_no_HTAP\"><\/span><strong>No columnar store, no HTAP&nbsp;<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Let\u2019s revisit what we had in TiDB 3.0. We had coprocessors on top of the storage layer, a cost-based optimizer with some smart operators, a single node vectorized compute engine, and a Spark accelerator. Despite all this, TiDB was still not a true HTAP database.<\/p>\n\n\n\n<p>First,<strong> TiDB didn\u2019t have a columnar storage engine, which was crucial for analytics. <\/strong>Second, <strong>TiDB could not support workload isolation, and workload interference happened often.<\/strong> In the case of ZTO Express, one of the largest logistics companies in the world, the customer had to reserve quite a lot of TiKV resources for TiSpark due to the inefficiency of the row format and bad workload isolation. Performance profiling showed that TiSpark on TiKV burned unnecessary I\/O bandwidth and CPU on the unused columns. In addition, ZTO Express had to add quite a few extra TiKV nodes to lower the peak resource utilization and leave enough safety space for OLTP workloads. Otherwise, the highly concurrent queries for individual packages would be greatly impacted by reporting and cause unstable performance.<\/p>\n\n\n\n<p>We tried things like thread pool and some \u201csmart\u201d scheduling techniques, but they were risky and not flexible enough to put two workloads in the same machine. After some failed experiments, we built a special component that alleviated both these headaches: <strong>TiFlash, a distributed columnar store that replicated data from TiKV in a columnar format.&nbsp;<\/strong><\/p>\n\n\n\n<p>TiFlash looked a bit different than it does today. The TiFlash prototype was on top of Ceph, an open-source, software-defined storage platform, with the change data capture (CDC) as the data replication channel. This is similar to Snowflake on top of the object storage. I will not call this a \u201cwrong decision\u201d since we still add object storage support, but it was too aggressive for most TiDB users.&nbsp; At that time, they favored on-premises solutions, and Ceph was too cumbersome if added to our product.&nbsp;<\/p>\n\n\n\n<p><strong>After some unsuccessful user trials, we turned to a Raft learner-based design: TiFlash replicating data from TiKV via the Raft protocol as a non-voting role. <\/strong>Users could dedicate different machines to run the analytical engine and make the asynchronous replication from the OLTP engine. With the help of the Raft protocol, TiFlash could also provide consistent snapshot reads by checking replication progress as well as multiversion concurrency control (MVCC). <strong>This led to complete workload isolation.&nbsp;<\/strong><\/p>\n\n\n\n<p>The diagram below shows the architecture of TiKV and TiFlash. The lower left shows the TiFlash storage layer, and the lower right shows the TiKV storage layer. Data was written into the TiDB server and then was synced from TiKV to TiFlash via a Raft learn protocol. The async replication did not impact the normal OLTP workloads in TiKV.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"849\" height=\"413\" src=\"https:\/\/static.pingcap.com\/files\/2023\/04\/14013139\/image-73-1024x576-1.jpg\" alt=\"\" class=\"wp-image-11613\" srcset=\"https:\/\/static.pingcap.com\/files\/2023\/04\/14013139\/image-73-1024x576-1.jpg 849w, https:\/\/static.pingcap.com\/files\/2023\/04\/14013139\/image-73-1024x576-1-300x146.jpg 300w, https:\/\/static.pingcap.com\/files\/2023\/04\/14013139\/image-73-1024x576-1-768x374.jpg 768w\" sizes=\"auto, (max-width: 849px) 100vw, 849px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>TiFalsh and TiKV architecture&nbsp;<\/em><\/p>\n\n\n\n<p><strong>TiFlash had an updatable column store. <\/strong>In general, a columnar store is not fit for online updates based on primary keys. Traditional data warehouses or databases only support batch updates each hour or day. To solve this problem, <strong>we introduced a delta tree, <\/strong>a new design that can be seen as a combination of a B+ tree and log-structured merge (LSM) tree. However, the delta tree has larger leaf nodes than a B+ tree, and it has double the layers of an LSM tree. It divides the column engine into a write-optimized area and a read-optimized area.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"395\" src=\"https:\/\/www.pingcap.com\/core\/uploads\/2022\/07\/image-8-1024x395.png\" alt=\"\" class=\"wp-image-7647\" style=\"width:788px;height:304px\" srcset=\"https:\/\/static.pingcap.com\/files\/2022\/07\/image-8-1024x395.png 1024w, https:\/\/static.pingcap.com\/files\/2022\/07\/image-8-300x116.png 300w, https:\/\/static.pingcap.com\/files\/2022\/07\/image-8-768x296.png 768w, https:\/\/static.pingcap.com\/files\/2022\/07\/image-8.png 1266w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"has-text-align-center\"><em>LSM tree vs delta tree<\/em><\/p>\n\n\n\n<p><strong>With the inception of TiFlash, TiDB 4.0 became a true HTAP database.<\/strong> If you are interested in knowing more about the HTAP design, I recommend that you read the paper, <em>TiDB: A Raft-based HTAP Database<\/em><sup>[1]<\/sup>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"A_smart_MPP_design_and_an_even_smarter_optimizer\"><\/span><strong>A smart MPP design and an even smarter optimizer<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>TiDB became a true HTAP database when it introduced TiFlash in TiDB 4.0, but we still had a technical issue to address: <strong>TiSpark was the only distributed query engine in the TiDB ecosystem.<\/strong> It was not suitable for small- to medium-sized interactive cases, and its MR-style shuffle model was quite heavy. We needed a native computation engine in the distributed framework with an MPP style. Moreover, we reached a sticking point: to add new features or optimize our code, we\u2019d have to modify the Spark engine itself. But if we did that, it would be a heavy burden to keep in sync with the upstream.<strong>&nbsp;<\/strong><\/p>\n\n\n\n<p><strong>After quite a few intense debates, we decided to go for the MPP architecture in <\/strong><a href=\"\/blog\/tidb-5-0-a-one-stop-htap-database-solution\/\"><strong>TiDB 5.0<\/strong><\/a>. With this new architecture, TiFlash would be more than a storage node:<strong> it would be a fully-functioning analytical engine<\/strong>. The TiDB server would still be the single entrance to SQL, and the optimizer would choose the most efficient query execution plan based on cost, but it had one more option: the MPP engine.<\/p>\n\n\n\n<p>In TiDB\u2019s MPP mode, TiFlash complements the computing capabilities of the TiDB servers. When the TiDB server deals with OLAP workloads, it steps back to be a master node. The user sends a request to the TiDB server, and all TiDB servers perform table joins and submit the results to the optimizer for decision making. The optimizer assesses all the possible execution plans (row-based, column-based, indexes, single-server engine, and MPP engine) and chooses the optimal one.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"473\" src=\"https:\/\/static.pingcap.com\/files\/2023\/04\/14011204\/image-71.png\" alt=\"\" class=\"wp-image-11610\" srcset=\"https:\/\/static.pingcap.com\/files\/2023\/04\/14011204\/image-71.png 1024w, https:\/\/static.pingcap.com\/files\/2023\/04\/14011204\/image-71-300x139.png 300w, https:\/\/static.pingcap.com\/files\/2023\/04\/14011204\/image-71-768x355.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>TiDB\u2019s MPP mode<\/em><\/p>\n\n\n\n<p>The following diagram shows how the analytical engine breaks down and processes the execution plan in TiDB\u2019s MPP mode. Each dotted box represents the physical border of a node.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"579\" src=\"https:\/\/static.pingcap.com\/files\/2023\/04\/14011213\/image-72.png\" alt=\"\" class=\"wp-image-11611\" srcset=\"https:\/\/static.pingcap.com\/files\/2023\/04\/14011213\/image-72.png 1024w, https:\/\/static.pingcap.com\/files\/2023\/04\/14011213\/image-72-300x170.png 300w, https:\/\/static.pingcap.com\/files\/2023\/04\/14011213\/image-72-768x434.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>A query execution plan in MPP mode<\/em><\/p>\n\n\n\n<p>Our first version of the MPP framework already exceeded the TPC-H performance of some traditional analytical databases like Greenplum. It also greatly expanded TiDB\u2019s use cases and, in 2021, it helped us acquire quite a few important HTAP customers. All the days and nights we spent at customer sites led to solid improvements to the MPP architecture in TiDB 6.0. The TiFlash engine finally entered was maturing fast.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Citius_Altius_Fortius\"><\/span><strong>Citius, Altius, Fortius<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>TiDB 5.0 delivered the first version of the TiFlash analytical engine with an MPP execution mode to serve a wider range of application scenarios. In <a href=\"\/blog\/tidb-6-0-a-leap-towards-an-enterprise-grade-cloud-database\/\">TiDB 6.0<\/a>, we improved TiFlash even more, making it support:\u00a0<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>More operators and functions<\/strong>. The TiDB 6.0 analysis engine adds over 110 built-in functions as well as multiple JOIN operators. Moreover, MPP mode supports the window function framework and partition table. This release substantially improves TiDB analysis engine performance, which in turn benefits computing.&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>An optimized thread model<\/strong>. Earlier versions of TiDB placed little restraint on thread resource usage for the MPP mode. This could waste a large amount of resources when the system handled high-concurrency short queries. Also, when performing complex calculations, the MPP engine occupied a lot of threads, which led to performance and stability issues. To address this problem, TiDB 6.0 introduces a flexible thread pool and restructures the way operators hold threads. This optimizes resource usage in MPP mode and multiplies performance with the same computing resources in short queries and better reliability in high-pressure queries.<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>A more efficient column engine.<\/strong> By adjusting the storage engine\u2019s file structure and I\/O model, TiDB 6.0 not only optimizes the plan for accessing replicas and file blocks on different nodes, but it also improves write amplification and overall code efficiency. Test results from our customers indicate that concurrency capability has improved by over 50% to 100% in high read-write hybrid workloads with CPU, and memory resource usage has dramatically reduced.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Looking_ahead\"><\/span><strong>Looking ahead<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Building an HTAP database is a long journey, and our efforts paid off. More and more TiDB users are benefitting from TiDB HTAP abilities for faster decision making, better user experience, and quicker time to market.&nbsp;<\/p>\n\n\n\n<p>Even though we\u2019ve traveled far, there\u2019s still a long road ahead. In fact, HTAP is never a pure technical term, but also represents users\u2019 needs. It has been evolving over the years, from in-memory technologies in the very beginning, to various designs today. For example, SingleStore follows the \u201cclassic\u201d in-memory architecture with a single engine; HeatWave is mainly in-memory with a separated engine; TiDB and AlloyDB use on disk storage and separate workloads into different resources. Although HTAP practitioners make different design decisions, one thing in common never changes: users\u2019 need is the utterly most important. HTAP designs will continue to evolve to solve users\u2019 problems smartly. There are still many problems from the user side, such as real-time data modeling and transforming, and better leveraging the cloud infra, needing to be fixed.&nbsp;<\/p>\n\n\n\n<p>Nevertheless, I believe HTAP databases will eventually prevail in the database world. Before that day comes, we will continue our long expedition.<\/p>\n\n\n\n<p>If you are interested in TiDB, you\u2019re welcome to join our <a href=\"https:\/\/slack.tidb.io\/invite?team=tidb-community&amp;channel=everyone&amp;ref=pingcap-blog\">community on Slack<\/a> and TiDB Internals to share your thoughts with us. You can also follow us on <a href=\"https:\/\/twitter.com\/PingCAP\">Twitter<\/a>, <a href=\"https:\/\/www.linkedin.com\/company\/pingcap\/\">LinkedIn<\/a>, and <a href=\"https:\/\/github.com\/pingcap\">GitHub<\/a> for the latest information.&nbsp;<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Keep reading: <\/strong><br><a href=\"https:\/\/www.pingcap.com\/blog\/using-retool-and-tidb-cloud-to-build-a-real-time-kanban-in-30-minutes\/\">Using Retool and TiDB Cloud to Build a Real-Time Kanban in 30 Minutes<\/a><br><a href=\"https:\/\/www.pingcap.com\/blog\/build-a-better-github-insight-tool-in-a-week-a-true-story\/\">Build a Better Github Insight Tool in a Week? A True Story<\/a><br><a href=\"https:\/\/www.pingcap.com\/blog\/the-beauty-of-htap-tidb-and-alloydb-as-examples\/\">The Beauty of HTAP: TiDB and AlloyDB as Examples<\/a><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>References:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.vldb.org\/pvldb\/vol13\/p3072-huang.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">TiDB: A Raft-based HTAP Database<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>This post introduces the backstage stories of TiDB and its HTAP capability. <\/p>","protected":false},"author":62,"featured_media":7648,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ub_ctt_via":"","footnotes":""},"categories":[13],"tags":[10,11,9,52],"class_list":["post-7642","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-product","tag-htap","tag-real-time-analytics","tag-scalability","tag-tiflash"],"acf":[],"featured_image_src":"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg","author_info":{"display_name":"Shawn Ma","author_link":"https:\/\/www.pingcap.com\/ko\/blog\/author\/shawn-ma\/"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>The Long Expedition toward Making a Real-Time HTAP Database\u00a0 | TiDB<\/title>\n<meta name=\"description\" content=\"This post introduces some stories behind the HTAP design decisions of TiDB and how we learned from our customers.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The Long Expedition toward Making a Real-Time HTAP Database\u00a0 | TiDB\" \/>\n<meta property=\"og:description\" content=\"This post introduces some stories behind the HTAP design decisions of TiDB and how we learned from our customers.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"article:published_time\" content=\"2022-07-18T10:44:59+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-12-20T12:04:48+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183625-scaled.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1340\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Shawn Ma\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183625-scaled.jpeg\" \/>\n<meta name=\"twitter:creator\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Shawn Ma\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/\"},\"author\":{\"name\":\"Shawn Ma\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/33ba97d494530cb5e04d4b17aaaa9b08\"},\"headline\":\"The Long Expedition toward Making a Real-Time HTAP Database\u00a0\",\"datePublished\":\"2022-07-18T10:44:59+00:00\",\"dateModified\":\"2024-12-20T12:04:48+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/\"},\"wordCount\":2306,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg\",\"keywords\":[\"HTAP\",\"Real-time analytics\",\"Scalability\",\"TiFlash\"],\"articleSection\":[\"Product\"],\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/\",\"url\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/\",\"name\":\"The Long Expedition toward Making a Real-Time HTAP Database\u00a0 | TiDB\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg\",\"datePublished\":\"2022-07-18T10:44:59+00:00\",\"dateModified\":\"2024-12-20T12:04:48+00:00\",\"description\":\"This post introduces some stories behind the HTAP design decisions of TiDB and how we learned from our customers.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#primaryimage\",\"url\":\"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg\",\"width\":2560,\"height\":853},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"The Long Expedition toward Making a Real-Time HTAP Database\u00a0\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/33ba97d494530cb5e04d4b17aaaa9b08\",\"name\":\"Shawn Ma\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"caption\":\"Shawn Ma\"},\"url\":\"https:\/\/www.pingcap.com\/ko\/blog\/author\/shawn-ma\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"The Long Expedition toward Making a Real-Time HTAP Database\u00a0 | TiDB","description":"This post introduces some stories behind the HTAP design decisions of TiDB and how we learned from our customers.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pingcap.com\/ko\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/","og_locale":"ko_KR","og_type":"article","og_title":"The Long Expedition toward Making a Real-Time HTAP Database\u00a0 | TiDB","og_description":"This post introduces some stories behind the HTAP design decisions of TiDB and how we learned from our customers.","og_url":"https:\/\/www.pingcap.com\/ko\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","article_published_time":"2022-07-18T10:44:59+00:00","article_modified_time":"2024-12-20T12:04:48+00:00","og_image":[{"width":2560,"height":1340,"url":"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183625-scaled.jpeg","type":"image\/jpeg"}],"author":"Shawn Ma","twitter_card":"summary_large_image","twitter_image":"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183625-scaled.jpeg","twitter_creator":"@PingCAP","twitter_site":"@PingCAP","twitter_misc":{"Written by":"Shawn Ma","Est. reading time":"11\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#article","isPartOf":{"@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/"},"author":{"name":"Shawn Ma","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/33ba97d494530cb5e04d4b17aaaa9b08"},"headline":"The Long Expedition toward Making a Real-Time HTAP Database\u00a0","datePublished":"2022-07-18T10:44:59+00:00","dateModified":"2024-12-20T12:04:48+00:00","mainEntityOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/"},"wordCount":2306,"commentCount":0,"publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg","keywords":["HTAP","Real-time analytics","Scalability","TiFlash"],"articleSection":["Product"],"inLanguage":"ko-KR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/","url":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/","name":"The Long Expedition toward Making a Real-Time HTAP Database\u00a0 | TiDB","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#primaryimage"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg","datePublished":"2022-07-18T10:44:59+00:00","dateModified":"2024-12-20T12:04:48+00:00","description":"This post introduces some stories behind the HTAP design decisions of TiDB and how we learned from our customers.","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#primaryimage","url":"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg","contentUrl":"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg","width":2560,"height":853},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"The Long Expedition toward Making a Real-Time HTAP Database\u00a0"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]},{"@type":"Person","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/33ba97d494530cb5e04d4b17aaaa9b08","name":"Shawn Ma","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/","url":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","contentUrl":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","caption":"Shawn Ma"},"url":"https:\/\/www.pingcap.com\/ko\/blog\/author\/shawn-ma\/"}]}},"grav_blocks":false,"card_markup":"<a class=\"card-resource bg-white\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/the-long-expedition-toward-making-a-real-time-htap-database\/\"><div class=\"card-resource__image-container\"><img class=\"card-resource__image\" alt=\"20220718-183631\" src=\"https:\/\/static.pingcap.com\/files\/2022\/07\/20220718-183631-scaled.jpeg\" loading=\"lazy\" width=2560 height=853 \/><\/div><div class=\"card-resource__content-container\"><div class=\"card-resource__content-head\"><div class=\"card-resource__category\">Product<\/div><\/div><h5 class=\"card-resource__title\">The Long Expedition toward Making a Real-Time HTAP Database\u00a0<\/h5><\/div><\/a>","_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/7642","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/users\/62"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/comments?post=7642"}],"version-history":[{"count":21,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/7642\/revisions"}],"predecessor-version":[{"id":24451,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/7642\/revisions\/24451"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media\/7648"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=7642"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/categories?post=7642"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tags?post=7642"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}