Presenting TiDB 7.1: The Most Advanced Distributed Database

To grow your business, you need fast, reliable access to accurate data. This critical component allows you to meet high customer expectations and accelerate the productivity of your developers and infrastructure engineers.

TiDB 7.1, our first of two planned long-term stable (LTS) releases for 2023, makes this easy by providing you with a future-proof database that can power a wide range of business-critical applications. TiDB 7.1 allows you to:

Stabilize business-critical workloads by giving multi-workload stability control to DB operators and significantly improving tail latencies.
Speed up workloads with fewer resources via architectural enhancements for higher throughput, and even faster online DDL.

In this post, we’ll discuss key TiDB 7.1 features and enhancements. We’ll also explore the now generally available (GA) features added in prior non-stable releases since the last stable release, TiDB 6.5.

There are some key features we wrote about in our TiDB 7.0 release blog and, as a result, will not expand upon in this post. Those features are TiFlash’s new architecture, TiFlash spill-to-disk query stability, and REORGANIZE LIST/RANGE partitions. For all other details on the delta between TiDB 6.5 (last LTS) and TiDB 7.1 (current LTS), please see the Release Notes.

TiDB 7.1 stabilizes business-critical workloads

The features and enhancements in this section fall under the theme of cluster stability. Specifically, stabilizing clusters regardless of workload and stabilizing latencies in workloads with edge cases.

Improved UX of resource control via resource groups

We introduced the resource control via resource groups feature in the previous blog (TiDB 7.0), setting the stage for a more complete multi-tenancy story. Briefly, this feature allows for operators of TiDB to set resource quotas and priority for different workloads so that mission-critical workloads can run unaffected by other workloads sharing the cluster.

The control given to the TiDB administrators to protect highly critical workloads is immense. However, it was only experimental when we last announced it. In TiDB 7.1, we have made this feature GA with two enhancements:

Removing the small latency degradation realized in very write heavy workloads
Adding native workload calibration tooling to help operators set more accurate resource quotas

New enhancements

Regarding the first enhancement, resource groups control workloads by the resource quotas and priority. The storage level controls priority. Additionally, when workloads are write heavy, the rescheduling of transactions based on priority can create higher tail latencies. The GA of this feature brings the tail latency back to expected levels.

The second enhancement has to do with the user experience of this feature. Setting resource quotas using an abstract unit like request units (RUs) could be difficult if you don’t have a frame of reference for how RUs pertain to their workload/s. Operators can now gain this knowledge in a couple of ways.

The fastest is to run a calibration command from the SQL interface to estimate RU usage of known benchmark suites like TPC-C and sysbench (configurable to align more closely with your workload). This estimate of RU capacity for the chosen workload is based on cluster size and available resources.

The best way to measure this is to use the newly available metrics that track RU consumption by resource group. You can assign resource groups to your workloads in staging/dev and learn what their RU usage looks like in that environment over some representative time period. From there, it’s trivial to apply quotas to production based on those learnings.

Improved performance stability in multiple hotspot scenarios

There are three key enhancements to TiKV, the underlying row storage of TiDB, that improve upon latency stability. We measure this by p99 latency. The optimizations introduced to deal with hotspots at three levels include:

Key-level hotspots: Lock conflict optimization
Shard-level (key region) hotspots: Batching of TiKV sub-tasks
Node-level hotspots: Load-based replica reads were introduced

Lock conflict optimization (GA)

TiDB has introduced an optimization for handling key-level hotspots. In workloads encountering many single-point pessimistic lock conflicts, a more stable algorithm for waking up waiting requests minimizes wasted retries, ultimately saving resources and tail latency across the cluster. When this optimization is enabled, our tests show – with a possible small throughput hit – you’ll improve tail latencies by 40-60%…even in the most conflict-heavy workloads.

Figure 1.

Batching TiKV subtasks (GA)

A query that isn’t highly selective may result in many keys needing to be read. In TiDB’s distributed and disaggregated architecture, queries like that can mean administering many requests to many shards distributed across the cluster. This can sometimes render 10s or 100s of thousands of RPCs throughout the cluster.

TiDB server, acting as the TiKV client, now recognizes opportunities to batch tasks targeting the same shard and sends those batches to the appropriate storage nodes. This drastically reduces network RPCs, stabilizing performance for these large batch queries. This enhancement can reduce latency in these scenarios by up to 50%.

Load-based replica reads (GA)

This optimization deals with node hotspots. When large batch queries distribute key lookups in a non-uniform way, there may be node hotspots. Something as common as an index lookup JOIN may incur this. When that happens, read requests queue, and when the queue is large, some requests may wait longer. We want to reduce latency in these scenarios by reasonably using the rest of the cluster’s resources to distribute the work.

TiKV has now introduced load-based replica reads to accomplish this. It comes with a configurable duration threshold for the queue that – when exceeded – tells TiDB to begin prioritizing replica reads. This feature can improve latency of hotspot scenarios like this by 70% to 200%.

For more on this optimization, see the documentation.

TiDB 7.1 speeds up workloads with fewer resources

Features and enhancements in this section speed up writes, reads, and operations for a better user experience. A new type of index was added to enable a category of workloads that was too inefficient before. Additionally, there were big changes and optimizations to write throughput, analytical query speed, and background task efficiency.

Added speed and flexibility with the multi-value index (GA)

This feature adds speed by means of a new index. Also known as a “JSON index”, this new type of secondary index was introduced in TiDB 6.6 and has since become GA. The multi-value index (supported in MySQL) enables an N:1 mapping of index records to data records. This allows queries to quickly check membership of specific values in rows storing JSON arrays.

For example, imagine you are storing data like this in a column:

{

    "user":"Bob",

    "user_id":31,

    "zipcode":[94477,94536]

}

Whether that data is stored as a blob, or zip code is stored directly as an array of zips, a multi-value index determines in which row a specific zip code is present.

The index is created using an expression that logically parses JSON data into a generated column and secondary index on that column. If you’re storing JSON as a blob and need to support queries that traverse multiple levels of nesting, simply create an index that does this traversal for you and filter on the found values.

For more detailed information on usage and considerations, see the Multi-value index documentation.

Accelerated TTL (GA)

Discussed as an experimental feature in our TiDB 6.5 release blog, this stable release makes it GA. While it was stable in 6.5, the GA of this feature introduces better performance and resource utilization by parallelizing it across TiDB nodes. This means that TiDB nodes can share in the work that involves issuing many tasks to scan and delete data in the cluster. The number of possible concurrent tasks is configurable. There are also many metrics associated with this feature for proper observability.

Faster analytics with late materialization

TiFlash, the columnar storage engine for TiDB, made late materialization GA in this version. In cases where TiFlash rows of interest are highly selective, the TiDB optimizer can choose to use late materialization. When enabled, TiDB can push down a portion of table scanning, filtering the rows that meet filter conditions. The database will then scan further columns for data based on data of these rows for further computation, thereby reducing IO scans and computations of data processing. This results in less data read from the disk.

The impact this feature has depends on workload and data distribution. It could significantly reduce latency in some scenarios (up to 70% in our tests). TiDB’s Query Optimizer can decide whether or not to use it, making it safe to have on by default.

Power up workloads with fewer resources using a new partitioned Raft storage engine (experimental)

In TiDB 6.6, we introduced a major change to the TiKV storage architecture. Although this architecture is still considered experimental (off by default, and can only be enabled in fresh clusters), we have completed major enhancements for this LTS and received strong test feedback in pre-production environments.

Prior to TiDB 6.6, the storage architecture was such that each key range (TiKV “region”) shared a single underlying storage engine in nodes. The new architecture physically isolates all of the key ranges (you could refer to them as shards) and gives them their own instance of the storage engine. The benefits of the new architecture are significant:

The separation of each shard into their own storage engine on each node makes writes more distributed, increasing throughput. In some tests, the write throughput increased by almost 300%.
The isolation also helps to mitigate workload interference in multiple ways, namely compactions. Compactions of one shard’s data does not require or interfere with the files from another shard’s data, making compactions quieter on the nodes.
This change also makes it easier to snapshot and move shards from node to node, making scaling your cluster in or out upwards of 500% faster.
In testing some real-world customer workloads, this new architecture also saw tail latencies decrease by almost half.
Lastly, this lays the groundwork for more features designed to increase the total possible scale of a single cluster.

In TiDB 7.1, we’ve improved the stability of the performance benefits listed above and added network traffic optimizations. This feature will be considered GA when we add support for it in tooling like TiCDC, Backup & Restore, and Lightning (import).

For more detail, see the documentation on this architecture.

Huge improvements to online DDL (experimental)

In the TiDB 6.5 release blog, we introduced accelerated ADD INDEX, which increased the speed of that operation by 10x. With TiDB 7.0, we made this feature GA by making it compatible with point-in-time recovery (PiTR). In the stable TiDB 7.1 – and similar to the improvement on TTL above – we introduce a framework for distributing DDL jobs across TiDB nodes to increase performance even more.

TiDB 7.1: Get started and learn more

We’re excited to deliver these innovative enterprise-grade features in TiDB 7.1, the latest LTS version of our flagship database. Discover more, or try TiDB today:

Download TiDB 7.1 LTS—installation takes just a few minutes.
Join the TiDB Slack Community for interactions with other TiDB practitioners and experts, as well as real-time discussions with our engineering teams.

Book a Demo