For decades, Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) databases have been the norms of data infrastructure. OLTP databases handle transactional data processing. OLAP databases handle analytical queries based on the data imported from the OLTP database via an extract, transform, and load (ETL). However, this OLTP+ETL+OLAP solution is expensive and complicated, and can’t satisfy the demand for more timely access to the latest data analytics as businesses grow.
A Hybrid Transactional and Analytical Processing (HTAP) database is the solution to the rescue, which processes OLTP and OLAP workloads in the same architecture. With HTAP, data can be analyzed and queried almost immediately after the transaction takes effect, and transactional data and analytics stay in strong consistency in real-time.
Coined by Gartner in 2014 as a buzzword, HTAP hadn’t really caught on due to technical challenges. Recently, however, with the emergence of modern architectural designs such as GreenPlum, TiDB, AlloyDB, and UniStore, HTAP appears to be on the rise.
In this article, we will walk through the major technical paths of HTAP databases and compare the representative databases along each path.
Technical paths to HTAP
An HTAP database combines both OLTP and OLAP capabilities in the same database. The technical route is really simple—you either extend transaction processing capabilities on an OLAP database or extend analytical processing capabilities on an OLTP database.
HTAP extended from OLAP
OLAP databases process multiple reads in one write, and the focus is on scanning query capability. OLAP databases mostly use columnar storage, with other key technologies such as Massively Parallel Processing (MPP) and vector computing. However, processing high-performance transactions require frequent and fast access to the data in rows. Columnar storage engines are not designed for this. For a column-based OLAP database to provide
HTAP capabilities, the database must support both bulk columnar storage and highly-concurrent modification and query against single rows.
HTAP extended from OLTP
OLTP databases are designed for transactional processing. Backed by technologies such as row storage, high concurrency control, and disaster recovery, they are good at handling concurrent batch writes of small data, with a focus on how to quickly locate and write given rows of data. However, to perform high-performance analytical querying on a row-based OLTP database, the database must support complex calculations such as aggregation and association with massive data and specific fields. Row storage engines are not designed for this. The biggest challenge in transforming an OLTP database into an HTAP database is how to handle complex queries on massive amounts of row-based data..
Typical HTAP databases along each path
Along both HTAP technical paths are a growing number of products. Greenplum and TiDB are typical examples.
OLAP-extended HTAP: Greenplum as an example
Greenplum is an MPP OLAP database built on a share-nothing architecture.
Greenplum 6.0 enhanced its OLTP capability majorly through multi-format storage.
Greenplum provides multiple storage formats support for cold or hot data respectively. It also provides a single-table-access interface to end users through logically partitioned tables. However, to enhance the OLTP performance, data that needs to be frequently changed is stored in row storage. In some ways, this weakens the analytical performance before the row storage formatted data is converted to columnar storage.
Multiple storage formats in GreenPlum
OLTP-extended HTAP: TiDB as an example
Due to the use of MPP technology, the cluster needs to redistribute the data when it is scaled up. If the data volume is large, this may take a long time.
TiDB is an open-source distributed SQL database with strong consistency, distributed transactions, horizontal scaling, and MySQL compatibility.
Before TiDB 5.0, it had been a typical OLTP database. Then, it introduced the TiFlash engine and fully transformed into an HTAP database. TiFlash is a columnar storage extension of TiKV, TiDB’s row storage. TiFlash provides both a good level of isolation and guarantees strong consistency.
TiDB extends the Raft protocol and converts transactional data in row format to column format. It stores it in TiFlash to support fast analytical queries on transactional data. Data is available in both row format and column format so that TiDB can process OLAP and OLTP at the same time. Unlike GreenPlum, there is no distinctive processing for hot and cold data. Therefore, the performance for all data is consistent. However, an obvious cost of TiDB’s HTAP capabilities is more storage.
TiDB’s HTAP architecture
In addition, TiDB can scale up to hundreds of nodes thanks to the Raft-based consistency protocol. During the scaling process, the system will automatically redistribute the data, and the services stay on.
Comparing Greenplum and TiDB
The following table lists the key features of GreenPlum and TiDB.
|Applicable scenarios||Heavy OLAP + Light OLTP||Heavy OLTP + Light OLAP|
|Database as a Service offering||No||Yes|
|Easy to scale||No||Yes|
|Analytical performance (Hot data)||Low||High|
|Analytical performance (Cold data)||High||High|
By comparing GreenPlum and TiDB and the technical paths that they represent, we can see that both of them have their own advantages and disadvantages.
Greenplum’s HTAP is suitable for quasi-real-time data warehousing and analysis where online analytics is the main focus. This includes
- Complex data analytics and processing, with the need for quasi-real-time data insertion and update
- Complex data analytics and processing for cold data, where demand for hot data analytics is not high
TiDB is suitable for real-time analytical querying on operational data. This include:
- Highly concurrent OLTP operations with a large volume of growing data and fast queries
- Real-time analytics on operational data without affecting the OLTP business
HTAP databases arise when a single conventional OLTP or OLAP database finds workloads extremely challenging to process. Regardless of the technical path—OLTP to HTAP or OLAP to HTAP—the growing number of HTAP databases like TiDB and GreenPlum provide more options in the digital transformation era. Together they help users keep up with their fast-growing and complex business workloads.
If you are interested in this topic or are considering an HTAP database, feel free to join our community on Slack and TiDB Internals to share your thoughts with us. You can also follow us on Twitter, LinkedIn, and GitHub for the latest information.
(Edited By: Calvin Weng, Tom Dewan)
Join Us in the HTAP Summit
(Nov 1, Mountain View)
Engage, connect, and collaborate with peers and like-minded people in the
database and analytics world.
Subscribe to Stay Informed!
Get the massive scale and resiliency of TiDB databases in a fully managed cloud service
TiDB is effortlessly scalable, open, and trusted to meet the real-time needs of the digital enterprise