As businesses adopt cloud-native architectures, conversations will naturally lead to what we can do to make the database horizontally scalable. The answer will likely be to take a closer look at TiDB.
TiDB is an open source NewSQL database released under the Apache 2.0 License. Because it speaks the MySQL protocol, your existing applications will be able to connect to it using any MySQL connector, and most SQL functionality remains identical (joins, subqueries, transactions, etc.).
Step under the covers, however, and there are differences. If your architecture is based on MySQL with Read Replicas, you’ll see things work a little bit differently with TiDB. In this post, I’ll go through the top five key differences I’ve found between TiDB and MySQL.
With MySQL, it is common to scale-out via replication. Typically you will have one MySQL master with many slaves, each with a complete copy of the data. Using either application logic or technology like ProxySQL, queries are routed to the appropriate server (offloading queries from the master to slaves whenever it is safe to do so).
Scale-out replication works very well for read-heavy workloads, as the query execution can be divided between replication slaves. However, it becomes a bottleneck for write-heavy workloads, since each replica must have a full copy of the data. Another way to look at this is that MySQL Replication scales out SQL processing, but it does not scale out the storage. (By the way, this is true for traditional replication as well as newer solutions such as Galera Cluster and Group Replication.)
TiDB works a little bit differently:
For MySQL users learning TiDB, a simpler explanation is the TiDB servers are like an intelligent proxy that translates SQL into batched key-value requests to be sent to TiKV. TiKV servers store your tables with range-based partitioning. The ranges automatically balance to keep each partition at 96MB (by default, but configurable), and each range can be stored on a different TiKV server. The Placement Driver server keeps track of which ranges are located where and automatically rebalances a range if it becomes too large or too hot.
This design has several advantages of scale-out replication:
MySQL’s default storage engine has been InnoDB since 2010. Internally, InnoDB uses a B+tree data structure, which is similar to what traditional commercial databases use.
By contrast, TiDB uses RocksDB as the storage engine with TiKV. RocksDB has advantages for large datasets because it can compress data more effectively and insert performance does not degrade when indexes can no longer fit in memory.
Note that both MySQL and TiDB support an API that allows new storage engines to be made available. For example, Percona Server and MariaDB both support RocksDB as an option.
Tracking key metrics is an important part of maintaining database health. MySQL centralizes these fast-changing metrics in Performance Schema. Performance Schema is a set of in-memory tables that can be queried via regular SQL queries.
With TiDB, rather than retaining the metrics inside the server, a strategic choice was made to ship the information to a best-of-breed service. Prometheus+Grafana is a common technology stack among operations teams today, and the included graphs make it easy to create your own or configure thresholds for alarms.
If we ignore for a second that not all data definition language (DDL) changes in MySQL are online, a larger challenge when running a distributed MySQL system is externalizing schema changes on all nodes at the same time. Think about a scenario where you have 10 shards and add a column, but each shard takes a different length of time to complete the modification. This challenge still exists without sharding, since replicas will process DDL after a master.
TiDB implements online DDL using the protocol introduced by the Google F1 paper. In short, DDL changes are broken up into smaller transition stages so they can prevent data corruption scenarios, and the system tolerates an individual node being behind up to one DDL version at a time.
The MySQL team has traditionally focused its attention on optimizing performance for online transaction processing (OLTP) queries. That is, the MySQL team spends more time making simpler queries perform better instead of making all or complex queries perform better. There is nothing wrong with this approach since many applications only use simple queries.
TiDB is designed to perform well across hybrid transaction/analytical processing (HTAP) queries. This is a major selling point for those who want real-time analytics on their data because it eliminates the need for batch loads between their MySQL database and an analytics database.
These are my top five observations based on 15 years in the MySQL world and coming to TiDB. While many of them refer to internal differences, I recommend checking out the TiDB documentation on MySQL Compatibility. It describes some of the finer points about any differences that may affect your applications.
Note: The original version of this article was published on opensource.com