{"id":229,"date":"2021-08-19T00:00:00","date_gmt":"2021-08-19T00:00:00","guid":{"rendered":"https:\/\/en.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/"},"modified":"2024-07-02T10:03:50","modified_gmt":"2024-07-02T17:03:50","slug":"building-a-real-time-data-warehouse-with-tidb-and-pravega","status":"publish","type":"post","link":"https:\/\/www.pingcap.com\/ko\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/","title":{"rendered":"Building a Real-Time Data Warehouse with TiDB and Pravega"},"content":{"rendered":"<p>Companies with a lot of data rely on data warehouses for data processing and analytics. To achieve business agility, <strong>offline data warehouses are no longer sufficient, and real-time data warehouses are gradually taking over.<\/strong> Currently, real-time data warehouses often use Apache Flink to consume data from Apache Kafka and stream data into a database. However, because Kafka doesn&#8217;t persist data in disks, in extreme cases, data is lost.<\/p>\n\n\n\n<p>After researching the databases and storage systems on the market, <strong>we found a more efficient and accurate real-time data warehouse solution: <a href=\"https:\/\/www.pingcap.com\/ko\/tidb\/\">Pravega<\/a> + <a href=\"https:\/\/www.pingcap.com\/ko\/tidb\/\">\ud2f0DB<\/a>.<\/strong><\/p>\n\n\n\n<p>In this article, I&#8217;ll introduce Pravega, a distributed stream storage system, and TiDB, a distributed SQL database. This combination resolves Kafka&#8217;s data persistence dilemma and provides auto scaling capabilities, improving concurrency, usability, and security for real-time data warehouses. I also provide a <a href=\"https:\/\/github.com\/wangtianyi2004\/tidb-pravega-quick-start\">docker-compose demo<\/a> for you to try Pravega and TiDB. I hope you find this article helpful.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Pravega_a_stream_storage_system\"><\/span>Pravega, a stream storage system<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Open sourced by Dell EMC, <a href=\"https:\/\/github.com\/pravega\/pravega\">Pravega<\/a> is a stream storage system and a Cloud Native Computing Foundation (CNCF) sandbox project. It is similar to Kafka and Apache Pulsar and provides stream and schema registry. But Pravega offers more functionalities:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Auto-scaling without application awareness.<\/li>\n\n\n\n<li>A complete storage interface with stream-based abstraction to support unified access by upper-level compute engines.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1282\" height=\"682\" src=\"https:\/\/www.pingcap.com\/core\/uploads\/2021\/08\/pravega-architecture.png\" alt=\"Pravega architecture\" class=\"wp-image-3137\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-architecture.png 1282w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-architecture-300x160.png 300w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-architecture-1024x545.png 1024w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-architecture-768x409.png 768w\" sizes=\"auto, (max-width: 1282px) 100vw, 1282px\" \/><\/figure>\n\n\n\n<p>In distributed systems, client applications and messaging systems often use message queues to asynchronously transfer messages. When it comes to message queues, everyone thinks about Kafka. Kafka is a distributed log system based on Zookeeper. It supports multiple partitions, multiple replicas, and multiple consumers.<\/p>\n\n\n\n<p>Pravega, in contrast, is a new stream storage system, built to solve the problems that Kafka cannot. It refactors the architecture of stream storage. As a real-time stream storage solution, Pravega natively supports long-term data retention. Pravega writes data on the Hadoop Distributed File System (HDFS) or S3, thus eliminating the concerns over data persistence. Moreover, Pravega only stores one copy of data across the whole system.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1999\" height=\"904\" src=\"https:\/\/www.pingcap.com\/core\/uploads\/2021\/08\/pravega-solves-problems-kafka-cannot.png\" alt=\"Pravega's design solves problems Kafka cannot\" class=\"wp-image-230\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-solves-problems-kafka-cannot.png 1999w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-solves-problems-kafka-cannot-300x136.png 300w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-solves-problems-kafka-cannot-1024x463.png 1024w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-solves-problems-kafka-cannot-768x347.png 768w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-solves-problems-kafka-cannot-1536x695.png 1536w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-solves-problems-kafka-cannot-1440x651.png 1440w\" sizes=\"auto, (max-width: 1999px) 100vw, 1999px\" \/><\/figure>\n\n\n\n<div class=\"caption-center\">Pravega&#8217;s design solves problems Kafka cannot<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Why Pravega prevails over Kafka<\/h3>\n\n\n\n<p>You may wonder, &#8220;Why reinvent the wheel when there&#8217;s already Kafka?&#8221; When I used Kafka, I was troubled by three problems: data loss, data retention, and consumer rebalance.<\/p>\n\n\n\n<p>Kafka takes in more information than it gives out. After the offset is committed, there are risks for data loss.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>When you set <code>acks = all<\/code>, ACK is returned only when all consumers confirm that the message is saved, so no data is lost.<\/li>\n\n\n\n<li>When <code>acks = 1<\/code>, ACK is returned if the leader consumer saves the message. If the leader shuts down before it backs up data, data is lost.<\/li>\n\n\n\n<li>When <code>acks = 0<\/code>, Kafka does not wait for an acknowledgement from the consumers. When the consumers shut down, data is lost.<\/li>\n<\/ul>\n\n\n\n<p>Kafka doesn&#8217;t provide a simple and efficient solution for persisting data to HDFS or S3, so data retention becomes a problem. Although Confluent offers the solution, you have to use two sets of storage interfaces to access data from different layers.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use Apache Flume to access data through Kafka -> Flume -> HDFS.<\/li>\n\n\n\n<li>Use kafka-hadoop-loader to access data through Kafka -> kafka-hadoop-loader -> HDFS.<\/li>\n\n\n\n<li>Use Kafka Connect HDFS to access data through Kafka -> Kafka Connect HDFS -> HDFS.<\/li>\n<\/ul>\n\n\n\n<p>Consumer rebalancing is also harmful. Because new consumers are added to the queue, the queue might stop consuming messages during the rebalance. Because of the long commit interval, consumers might repeatedly process data. Either way, rebalancing might cause a message backlog, which increases latency.<\/p>\n\n\n\n<p>Compared to Kafka, Pravega offers more features:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1372\" height=\"690\" src=\"https:\/\/www.pingcap.com\/core\/uploads\/2021\/08\/pravega-vs-kafka.png\" alt=\"Pravega vs. Kafka\" class=\"wp-image-231\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-vs-kafka.png 1372w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-vs-kafka-300x151.png 300w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-vs-kafka-1024x515.png 1024w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-vs-kafka-768x386.png 768w\" sizes=\"auto, (max-width: 1372px) 100vw, 1372px\" \/><\/figure>\n\n\n\n<div class=\"caption-center\">Pravega vs. Kafka<\/div>\n\n\n\n<p>Pravega uses Apache BookKeeper to write concurrent, real-time data with low latency. However, BookKeeper only serves as a cache layer for batch write. All read requests to Pravega are made directly to HDFS or S3 to take advantage of their high throughput capabilities.<\/p>\n\n\n\n<p>In other words, Pravega does not use BookKeeper as a data buffer layer, but provides an HDFS or S3-based storage layer. This storage layer supports abstractions for both <strong>low-latency tailing read and write<\/strong> <strong>\uadf8\ub9ac\uace0<\/strong> <strong>high-throughput catchup read<\/strong>. Systems that use BookKeeper as a separate layer might perform poorly when data moves between BookKeeper and HDFS or S3. Pravega, by contrast, ensures satisfactory performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Benefits of Pravega<\/h3>\n\n\n\n<p>Usually, DBAs have three main concerns: <strong>data accuracy, system stability, and system usability<\/strong>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data accuracy is vitally important. Any data loss, corruption, or duplication would be a catastrophe.<\/li>\n\n\n\n<li>System stability and usability relieve DBAs from tedious maintenance procedures, so they can invest their time on improving the system.<\/li>\n<\/ul>\n\n\n\n<p>Pravega addresses these DBA concerns. Its long-term retention ensures data safety, exactly-once semantics guarantees data accuracy, and auto-scaling makes system maintenance a breeze.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Real-time_data_warehouse_architecture\"><\/span>Real-time data warehouse architecture<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>A real-time data warehouse usually has four components: <strong>data collection layer, data storage layer, real-time computing layer, and real-time application layer<\/strong>. By integrating multiple technologies into a seamless architecture, we can build an extensible big data architecture that supports data analytics and mining, online transactions, and unified batch and stream processing.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1999\" height=\"776\" src=\"https:\/\/www.pingcap.com\/core\/uploads\/2021\/08\/pravega-four-components-in-a-real-time-data-warehouse.png\" alt=\"Four components in a real-time data warehouse\" class=\"wp-image-232\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-four-components-in-a-real-time-data-warehouse.png 1999w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-four-components-in-a-real-time-data-warehouse-300x116.png 300w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-four-components-in-a-real-time-data-warehouse-1024x398.png 1024w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-four-components-in-a-real-time-data-warehouse-768x298.png 768w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-four-components-in-a-real-time-data-warehouse-1536x596.png 1536w, https:\/\/static.pingcap.com\/files\/2021\/08\/pravega-four-components-in-a-real-time-data-warehouse-1440x559.png 1440w\" sizes=\"auto, (max-width: 1999px) 100vw, 1999px\" \/><\/figure>\n\n\n\n<div class=\"caption-center\">Four components in a real-time data warehouse<\/div>\n\n\n\n<p>There are various choices for the data storage layer, but not all of them are suitable for a real-time data warehouse:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Hadoop or traditional OLAP databases can&#8217;t provide satisfactory real-time processing.<\/li>\n\n\n\n<li>NoSQL solutions like HBase can scale and process data in real time, but can&#8217;t provide analysis.<\/li>\n\n\n\n<li>Standalone relational databases can&#8217;t scale out to accommodate massive data.<\/li>\n<\/ul>\n\n\n\n<p>TiDB, however, addresses all these needs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">TiDB, a distributed HTAP database<\/h3>\n\n\n\n<p><a href=\"https:\/\/www.pingcap.com\/ko\/products\/tidb\/\">\ud2f0DB<\/a> is an open source, distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.<\/p>\n\n\n\n<p><strong>Compared to other open source databases, TiDB is more suitable for building real-time data warehouses because of its HTAP architecture.<\/strong> TiDB possesses a hybrid storage layer consisting of TiKV, a row-based storage engine, and TiFlash, a columnar storage engine. The two storage engines use TiDB as a shared SQL layer. TiDB answers online transactional processing (OLTP) and online analytical processing (OLAP) queries, and fetches data from either engine based on the cost of the execution plan.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1786\" height=\"948\" src=\"https:\/\/www.pingcap.com\/core\/uploads\/2021\/08\/tidb-5.0-htap-architecture.png\" alt=\"TiDB HTAP architecture\" class=\"wp-image-233\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-htap-architecture.png 1786w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-htap-architecture-300x159.png 300w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-htap-architecture-1024x544.png 1024w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-htap-architecture-768x408.png 768w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-htap-architecture-1536x815.png 1536w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-htap-architecture-1440x764.png 1440w\" sizes=\"auto, (max-width: 1786px) 100vw, 1786px\" \/><\/figure>\n\n\n\n<div class=\"caption-center\">TiDB HTAP architecture<\/div>\n\n\n\n<p>Moreover, TiDB 5.0 introduces <a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\/release-5.0.0#mpp-architecture\">the Massively Parallel Processing (MPP) architecture<\/a>. In MPP mode, TiFlash complements TiDB&#8217;s computing capabilities. When dealing with OLAP workloads, TiDB becomes a master node. The user sends a request to TiDB server, and all TiDB servers perform table joins and submit the result to the optimizer for decision making. The optimizer assesses all the possible execution plans (row-based, column-based, indexes, single-server engine, and MPP engine) and chooses the optimal one.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1999\" height=\"924\" src=\"https:\/\/www.pingcap.com\/core\/uploads\/2021\/08\/tidb-5.0-mpp-mode.jpg\" alt=\"TiDB's MPP mode\" class=\"wp-image-234\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-mpp-mode.jpg 1999w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-mpp-mode-300x139.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-mpp-mode-1024x473.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-mpp-mode-768x355.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-mpp-mode-1536x710.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/08\/tidb-5.0-mpp-mode-1440x666.jpg 1440w\" sizes=\"auto, (max-width: 1999px) 100vw, 1999px\" \/><\/figure>\n\n\n\n<div class=\"caption-center\">TiDB&#8217;s MPP mode<\/div>\n\n\n\n<p>For example, an order processing system may experience a sudden traffic peak during a sales campaign. During that peak, businesses need to perform quick analytics so they can timely react and respond to customer behaviors. Traditional data warehouses can hardly cope with flooding data in a short period of time, and it might take a long time to perform the follow-up data analytical processing.<\/p>\n\n\n\n<p>With the MPP computing engine, <strong>TiDB can anticipate the coming traffic peak and dynamically scale out the cluster to provide more resources for the campaign<\/strong>. It can then easily respond to aggregation and analytical requests within seconds.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When TiDB meets Pravega<\/h3>\n\n\n\n<p>With the help of Flink, TiDB teams up with Pravega to build a real-time, high-throughput, stable data warehouse. This data warehouse is able to meet various user requirements for big data and handle OLTP and OLAP workloads in one stop.<\/p>\n\n\n\n<p>To better showcase the usage of Pravega and TiDB, we provide a <a href=\"https:\/\/github.com\/wangtianyi2004\/tidb-pravega-quick-start\">demo based on docker-compose<\/a>, which demonstrates how data flows from Pravega through Flink to TiDB. You can write and commit Flink jobs via Flink SQL client and observe the execution at <code>&lt;HOST_IP&gt;:8081<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Join_us\"><\/span>Join us!<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>If you are interested in building a real-time data warehouse using TiDB and Pravega, <a href=\"https:\/\/slack.tidb.io\/invite?team=tidb-community&amp;channel=everyone&amp;ref=pingcap-blog\">join us on Slack<\/a> to explore the latest solution of data warehouses.<\/p>","protected":false},"excerpt":{"rendered":"<p>This article introduces a new solution for real-time data warehouse: Pravega + TiDB. This combination resolves Kafka&#8217;s data persistence dilemma and provides auto scaling capabilities.<\/p>","protected":false},"author":25,"featured_media":236,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ub_ctt_via":"","footnotes":""},"categories":[18],"tags":[41,11],"class_list":["post-229","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-community","tag-big-data","tag-real-time-analytics"],"acf":[],"featured_image_src":"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg","author_info":{"display_name":"Tianyi Wang","author_link":"https:\/\/www.pingcap.com\/ko\/blog\/author\/tianyi-wang\/"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Building a Real-Time Data Warehouse with TiDB and Pravega | TiDB<\/title>\n<meta name=\"description\" content=\"In this article, we will introduce Pravega and TiDB, and provide a docker-compose demo for you to try building a real-time data warehouse.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a Real-Time Data Warehouse with TiDB and Pravega | TiDB\" \/>\n<meta property=\"og:description\" content=\"In this article, we will introduce Pravega and TiDB, and provide a docker-compose demo for you to try building a real-time data warehouse.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"article:published_time\" content=\"2021-08-19T00:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-07-02T17:03:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1501\" \/>\n\t<meta property=\"og:image:height\" content=\"500\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Tianyi Wang\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tianyi Wang\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/\"},\"author\":{\"name\":\"Tianyi Wang\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/18bf878bbba5f03120b3ed3c2204ab11\"},\"headline\":\"Building a Real-Time Data Warehouse with TiDB and Pravega\",\"datePublished\":\"2021-08-19T00:00:00+00:00\",\"dateModified\":\"2024-07-02T17:03:50+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/\"},\"wordCount\":1311,\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg\",\"keywords\":[\"Big Data\",\"Real-time analytics\"],\"articleSection\":[\"Community\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/\",\"url\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/\",\"name\":\"Building a Real-Time Data Warehouse with TiDB and Pravega | TiDB\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg\",\"datePublished\":\"2021-08-19T00:00:00+00:00\",\"dateModified\":\"2024-07-02T17:03:50+00:00\",\"description\":\"In this article, we will introduce Pravega and TiDB, and provide a docker-compose demo for you to try building a real-time data warehouse.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#primaryimage\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg\",\"width\":1501,\"height\":500},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building a Real-Time Data Warehouse with TiDB and Pravega\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/18bf878bbba5f03120b3ed3c2204ab11\",\"name\":\"Tianyi Wang\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"caption\":\"Tianyi Wang\"},\"description\":\"Database Architect\",\"url\":\"https:\/\/www.pingcap.com\/ko\/blog\/author\/tianyi-wang\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building a Real-Time Data Warehouse with TiDB and Pravega | TiDB","description":"In this article, we will introduce Pravega and TiDB, and provide a docker-compose demo for you to try building a real-time data warehouse.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pingcap.com\/ko\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/","og_locale":"ko_KR","og_type":"article","og_title":"Building a Real-Time Data Warehouse with TiDB and Pravega | TiDB","og_description":"In this article, we will introduce Pravega and TiDB, and provide a docker-compose demo for you to try building a real-time data warehouse.","og_url":"https:\/\/www.pingcap.com\/ko\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","article_published_time":"2021-08-19T00:00:00+00:00","article_modified_time":"2024-07-02T17:03:50+00:00","og_image":[{"width":1501,"height":500,"url":"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg","type":"image\/jpeg"}],"author":"Tianyi Wang","twitter_card":"summary_large_image","twitter_creator":"@PingCAP","twitter_site":"@PingCAP","twitter_misc":{"Written by":"Tianyi Wang","Est. reading time":"7\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#article","isPartOf":{"@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/"},"author":{"name":"Tianyi Wang","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/18bf878bbba5f03120b3ed3c2204ab11"},"headline":"Building a Real-Time Data Warehouse with TiDB and Pravega","datePublished":"2021-08-19T00:00:00+00:00","dateModified":"2024-07-02T17:03:50+00:00","mainEntityOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/"},"wordCount":1311,"publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg","keywords":["Big Data","Real-time analytics"],"articleSection":["Community"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/","url":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/","name":"Building a Real-Time Data Warehouse with TiDB and Pravega | TiDB","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#primaryimage"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg","datePublished":"2021-08-19T00:00:00+00:00","dateModified":"2024-07-02T17:03:50+00:00","description":"In this article, we will introduce Pravega and TiDB, and provide a docker-compose demo for you to try building a real-time data warehouse.","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#primaryimage","url":"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg","width":1501,"height":500},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"Building a Real-Time Data Warehouse with TiDB and Pravega"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]},{"@type":"Person","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/18bf878bbba5f03120b3ed3c2204ab11","name":"Tianyi Wang","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/","url":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","contentUrl":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","caption":"Tianyi Wang"},"description":"Database Architect","url":"https:\/\/www.pingcap.com\/ko\/blog\/author\/tianyi-wang\/"}]}},"grav_blocks":false,"card_markup":"<a class=\"card-resource bg-white\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/building-a-real-time-data-warehouse-with-tidb-and-pravega\/\"><div class=\"card-resource__image-container\"><img class=\"card-resource__image\" alt=\"building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg\" src=\"https:\/\/static.pingcap.com\/files\/2021\/08\/building-a-real-time-data-warehouse-with-tidb-and-pravega.jpg\" loading=\"lazy\" width=1501 height=500 \/><\/div><div class=\"card-resource__content-container\"><div class=\"card-resource__content-head\"><div class=\"card-resource__category\">Community<\/div><\/div><h5 class=\"card-resource__title\">Building a Real-Time Data Warehouse with TiDB and Pravega<\/h5><\/div><\/a>","_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/229","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/users\/25"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/comments?post=229"}],"version-history":[{"count":8,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/229\/revisions"}],"predecessor-version":[{"id":18035,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/229\/revisions\/18035"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media\/236"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=229"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/categories?post=229"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tags?post=229"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}