Author: Rick Golba (Product Marketing Manager at PingCAP)
Some time ago, when you bought a car, you had only one option for how it was going to get power: it came from an internal combustion engine. There were some variants on that technology, but they all relied on some sort of combustible fuel to generate the power needed to move the car along. More recently, we’ve seen a rise in electric cars. These cars do not have an internal combustion engine; rather, they rely on batteries that can be charged at home or at a charging station to generate the power for the car.
Both technologies have benefits and drawbacks. The internal combustion engine usually can deliver a high amount of power and is reasonably easy to refuel when needed. It has a negative side effect in that internal combustion engines are viewed as a primary pollution source. The electric engine is thought to be more ecologically sensitive, but it can be difficult to get enough power to drive a larger vehicle, and the need to recharge often can lead to range limitations.
One solution to this is the hybrid vehicle. Such a car has both an internal combustion engine and batteries for an electric engine. This means that it can operate using only one or, in some cases, both engines. You get the environmental benefits of an electric vehicle along with the range and power of an internal combustion car.
Hybrid Transactional and Analytical Processing databases
As it turns out, the same concepts can be applied to databases. For a long time, the only option we had was row store databases. They were great at processing transactions, getting data written and replicated, and processing standard queries.
Over time, those databases became constrained as data volumes grew and queries became more complex. Even though they were great at processing transactions, some queries were overly time-consuming and used far too many resources to be run regularly. Enter analytical databases, which usually store data in a columnar format. These databases can process intense analytical queries with amazing speed but often have limitations in terms of getting data written since the data needs to be broken out into its columnar components.
Now we have the Hybrid Transactional and Analytical Processing (HTAP) database. Like the hybrid car, you get the benefits of the transactional database, plus the goodness of an analytical database, all in a single package.
But do you need such a database? As in many cases, the answer is “it depends”.
Let’s consider a retail store. Depending on the size of the store and the volume of activity, an HTAP database like TiDB may or may not be a good fit. If this store is in a single location and sells a limited stock of goods, then TiDB probably is overkill for them. They could use another open-source product to process transactions and keep track of inventory. If there is a delay between when a product was sold and when it is taken out of the online inventory, it’s probably not a big problem. A store of this size may not even have an online store, so their main customer is the person who walks through the door to make a purchase.
Now, let’s look at a larger store, with a higher volume of products moving in and out, such as one of the big box retailers. In this case, having immediate access to an up-to-date inventory is highly important. For this type of retailer, TiDB is a great option.
When a transaction is processed in TiDB, the information is written simultaneously to the transaction store and the analytics store. TiDB uses a key-value store, called TiKV, to write the transactional request, and a column store, called TiFlash, to write the same data for analytical purposes. The fact that the data is written to both storage locations at the same time is the strength of an HTAP database.
This means that when a purchase is recorded in the transactional store, the purchase (and the corresponding reduction in inventory) is recorded in the analytical engine. There is no need for an Extract, Transform, and Load (ETL) process to run to populate data into the analytical storage. When an ETL process is involved, there is an unavoidable delay in getting the record written to the analytical engine meaning that queries can and will return out of date results. In some cases, this may not be an issue, but if a customer wants to know if a popular item is in stock, getting the most current information to them is crucial.
In financial situations, having access to up to the second data can literally mean savings or loss of money. Consider a case where you are using an analytical database to predict the price of a stock, with a purchase or sale being made once it is expected that the stock will reach a specified price point. If the data is even a few seconds out of date, the analysis is skewed, and the results are less than trustworthy; this can mean the difference between a sale at a profit or a sale at a loss.
Gaming companies have a different need. When playing a game, the user has multiple paths that they can take at any point within the game. Based on their current actions coupled with previous activities, the game can adapt to the individual user’s gameplay. In some cases, the game may make offers of items for purchase to assist a player in the game. It makes little sense if that offer comes after a user has successfully (or unsuccessfully) completed the task it was intended to assist. Having the ability to analyze the activities of a player as they are happening is key to the success of the game. Once again, relying on an ETL process to get the data over to the analytical database creates too much lag for sustainability.
If having the ability to run operational queries with real-time data benefits the company, then looking at a hybrid database makes sense. MySQL compatibility provides an easy path to migration and there is little or no need to alter your application or change the user interface. Built-in high availability provides a stable environment and massively parallel processing of analytical queries provides a greatly enhanced user experience.
An HTAP database is not needed by all companies, but there are organizations for which it makes perfect sense. In much the same way that we are embracing hybrid automobiles for certain needs, we can embrace hybrid databases for certain needs. Industries like those described above see huge benefits to implementing a hybrid database. Such a database continues to meet the need for speedy transactions while also delivering real-time analytics capabilities.
A fully-managed cloud DBaaS for predictable workloads
A fully-managed cloud DBaaS for auto-scaling workloads