We are excited to announce our latest preview release, TiDB 7.0. This release provides a first look into the latest features and innovations planned for our next production-ready, long-term stable (LTS) release slated for the middle of 2023. It’s intended for development, testing, and proof-of-concept projects.
TiDB 7.0 spotlights our continued focus on helping you grow your business with reliable performance and streamlined database operations. This allows you to meet high customer expectations and accelerate the productivity of your developers and IT operators.
Read on to discover the new enhancements released in TiDB 7.0 and the features soon to be available for production deployments. For more detailed information, check out the Release Notes.
Add workload stability with Resource Control
Introduced in TiDB 6.6, this feature is unique to TiDB and lays the groundwork for stable multi-tenancy. That is, TiDB customers can provide multi-tenancy to their customers safely.
This feature allows resource ceilings to be set for groups of one or more sessions such that, if the consumption from one workload or application becomes anomalously heavy, its resource consumption can be deterministically isolated from resources it is not allowed to use. This prevents interference with other possibly more critical workloads.
Below is a non-exhaustive list of business cases for this feature:
- Users can combine multiple applications into a single TiDB cluster, shrinking TCO and guaranteeing resources for possible new workloads.
- Users can safely run batch jobs during business hours.
- Staging environments can share a single TiDB Cluster with managed resource limits.
Sessions can be bound to resource groups in three ways:
- At the user definition level such that users’ sessions will always be subject to the set boundaries.
- At the session level ad hoc.
- At the SQL statement level via hint.
Stay tuned for a blog that covers this feature in more detail!
Stabilize analytical workloads with TiFlash Spill to Disk
TiFlash is the columnar storage and compute engine for TiDB. It serves as the backbone of the database’s analytical workload capability. Prior to TiDB 7.0, TiFlash processed all data in memory. As of this version, intermediate results may be spilled to disk, making storage an effective extension of available memory for large queries. The idea behind this feature is to trade some performance of large single queries for overall stability of analytical queries. Analytical stability makes these clusters more predictable.
This spilling to disk happens according to user-configurable parameters and applies to individual, pushed-down operations. Since this optimization takes place at the individual operator level, there are multiple places where this spill-to-disk functionality must be implemented. In TiDB 7.0, it has first been implemented for:
- Hash joins on equality conditions.
- Hash aggregations on GROUP BYs.
- TopN and sort operators in Window functions.
During these operations, if the amount of memory used by an operator exceeds the configured limit, instead of failing, the data will spill to disk and continue with subsequent processing.
To illustrate the impact on target workloads, we tested using the TPC-H benchmarking tool designed to simulate decision support systems. You can see the results in the below table:
|Execution time||Peak memory usage||Spilled partition||Total partition|
|Spill not enabled||9.17s||42.09 GiB||0||40|
|Spill enabled, but not triggered||9.10s||42.08 GiB||0||40|
|Max join memory usage set to 10G||15.45s/25.50 (disable concurrent restore)||30.63 GiB||21||40|
|Max join memory usage set to 5G||21.86s / 32.62s (disable concurrent restore)||18.82 GiB||32||40|
To learn more about this feature, visit the spill to disk documentation.
Reduce query latency with Automatic Plan Cache
TiDB 7.0 introduces an automated plan cache. Prior to this version, plan caches were supported, but only in the form of prepared statements (using PREPARE). This version introduces the framework for caching any query automatically.
Important things to note about the experimental stage of this feature:
- This implementation caches queries at the session level as opposed to globally. Global is planned for a future release. The session-level caching will reduce latency caused by finding the right plan across a larger cache but may also increase memory usage of the global cache due to possible duplicates across sessions.
- This implementation has limited coverage. The types of queries that can be cached by this feature today are single table filter and range queries. The notable types that cannot are single table complex queries and JOIN queries.
For an architectural diagram and further information on this feature, refer to the non-prepared execution plan cache documentation.
Improve elasticity with TiFlash Re-Architecture
There are two core properties of TiFlash relevant to the architectural changes:
- TiFlash holds columnar copies of data in TiKV.
- Storage and query processing are coupled in each node (dissimilar to TiDB/TiKV).
The TiFlash re-architecture comes in two forms:
- Disaggregating storage and compute.
- Offloading storage to object storage (S3 in this version).
The first makes storage and compute independently scalable from one another. That means no more adding storage when you need to scale queries, and no more adding compute when you need to increase write throughput. Additionally, writes and data management won’t get in the way of reads (and vice versa).
This experimental feature drastically softens the added expense of using TiFlash to support analytical workloads. It also makes this portion of your cluster much more cost-effective. Lastly, it provides a level of workload isolation that can improve stability.
TiUP and the K8s operator already support the ability to deploy and scale these now independent components.
For more information on this feature, refer to the TiFlash disaggregated storage and compute architecture and S3 support documentation.
Automatically expire and delete data with TTL Tables
TiDB 6.5 introduced “Time to live” (TTL), a very hotly-requested feature. Released with GA in TiDB 7.0, TTL is a SQL-configured way of setting an expiry time on rows in a table. In this way, operators can set automated deletion of data they consider to be too old for various reasons.
Operators and users of a database may want to automatically expire and delete rows of a table for reasons having to do with cost, performance, or security. More specifically:
- Larger tables can mean longer queries in some cases.
- Larger tables mean more storage cost.
- In TiDB, the larger a table is the more key ranges (“regions”) there are, so limiting table size can mitigate the load on the brain of the system (PD).
- Various compliance adherences may require data to expire.
Prior to TTL, TiDB users would have to choose between letting tables grow indefinitely, manually deleting rows, or building and maintaining their own delete automation. TTL being built into TiDB alleviates the burden of that choice and all the work it takes to not only do it, but to do it performantly.
To learn more about how to use this feature, visit the TTL documentation.
Increase scalability with Key Partitioning
Prior to 7.0, TiDB supported HASH, RANGE, and LIST partitioning. This version introduces KEY partitioning. Like hash partitioning, key partitioning can be used to distribute keys around nodes in the cluster to guard against hotspotting. Unlike hash partitioning, however, key partitioning is not limited to integer expressions/columns but supports distributing data based on non-integer columns and lists of columns.
Key partitioning provides a more flexible way to distribute data sets to improve scalability of the cluster.
Adapt to changing requirements with REORGANIZE PARTITION
TiDB has supported partitioning for a long time. Until today, however, the only ways to modify partitioning of partitioned tables were to add or drop partitions and truncate LIST/RANGE partitions.
TiDB 7.0 makes REORGANIZE PARTITION a generally available feature, which allows for online merging, splitting, and other changes to LIST/RANGE partitions.
This feature adds usability and flexibility to changing requirements.
Import data from remote stores with LOAD DATA
While LOAD DATA was already a supported SQL-based interface for importing data into TiDB, TiDB 7.0 introduces a ton of functionality to that. The LOAD DATA interface now supports importing data from remote stores. In addition, a lot of operational conveniences were added, including but not limited to detecting potential downstream issues that need to be fixed prior to import, running “detached” in the background, and being able to query status of the import job/s.
Get started and learn more
We’re excited to deliver these innovative enterprise-grade features in TiDB 7.0, the latest preview version of our flagship database. Discover more, or try TiDB today:
- Download TiDB 7.0—installation takes just a few minutes.
- Join the TiDB Slack Community for interactions with the active TiDB community, and real-time discussions with our engineering teams.
Have a question or comment about the article? Visit the TiDB Forum
Subscribe to Stay Informed!
The most advanced, open source, distributed SQL database
A fully-managed DBaaS with zero operational overhead