How TiDB Achieves 10x Performance Gains in Online DDL

Table schema changes are frequent in database management due to changing applications, query workloads, and data formats. These changes can take hours or days to complete, and can hog system resources and reduce application throughput. As tables get larger, making changes through data definition language (DDL) operations takes longer. This poses potential risks on applications and could severely impact business growth.

In this post, we will share how TiDB’s Fast Online DDL achieves a 10x performance improvement in online index creation. We’ll also provide a TiDB DDL performance comparison with Amazon Aurora and CockroachDB.

How does Fast Online DDL work in TiDB ?

Customers want fast online DDL operations that don’t affect throughput. As a distributed SQL database, TiDB 2.1 introduced the Online Asynchronous Schema Change algorithm based on Google’s F1 paper. This implementation allows online table schema changes without blocking read/write operations. This completely eliminates the need for external schema change tools or switching between sources and replicas. Evolving upon this foundation, TiDB 6.5 Long Term Support (LTS) release introduced Fast Online DDL.

After analyzing TiDB’s original online DDL approach, we found that the slowest part of the process was the full table scanning process for index data creation. The major bottleneck in index creation was the method of batch-filling an index based on transaction results. Therefore, we decided to resolve this performance issue by improving the process in three aspects: index creation mode, data transmission, and parallel import.

Transform the index creation mode

To understand the rationale behind the change, let’s take a look at the following comparison diagram:

Fig 1. Index creation mode before and after Fast Online DDL

The diagram on the left illustrates TiDB’s original approach of index creation during the online DDL. The original approach involves scanning the entire table data to construct the index records, which are represented as key-value (KV) pairs. These KV pairs are then committed to TiKV in transactions, with a batch size set by tidb_ddl_reorg_batch_size. However, this approach negatively impacted the performance with two types of overheads:

The time overhead brought by the two-phase committing (2PC) process for index records. As each index transaction usually has a data batch size of 256 rows or less, the total transaction time for committing a large number of index records back to TiKV could be significant.
The time overhead caused by rolling back and retrying due to conflicts between index transactions and user transactions. When the index records that are committed in transactions conflict with the index records updated by user transactions, it triggers a rollback and retry of either the user transaction or index backfilling transaction. This ultimately affects performance.

To address these issues, we have improved the original approach’s transactional batch-write mode by adopting a file-based batch import mode, as shown on the right side of the above diagram. We still begin with scanning the entire table data and then construct index KV pairs based on the scanned data and store them in TiDB’s local storage. Once the index KV pairs are sorted in TiDB, they are written back to TiKV using the ingest mode. This new method eliminates the above two types of overhead and significantly improves the performance.

Optimize data transmission

In the original approach to index creation, TiDB returns the entire row of data during table scanning. It then selects the columns required for the index to construct the index’s KV pairs. In the new approach, the required columns for the index are extracted before the data is returned from TiKV to TiDB. This greatly reduces the overall data transfer volume and index creation time.

Parallel index ingesting

Finally, the index records are ingested into TiKV in parallel to enhance the efficiency of writing data back to TiKV. However, it does increase the online processing load on TiKV. Currently we are developing a series of flow control measures to fully utilize TiKV’s idle bandwidth without causing stress on normal load processing loads.

DDL benchmarking: TiDB vs. popular databases

To validate our work, we conducted benchmark tests using Sysbench to compare the efficiency of DDL execution in the most common scenario of index creation. We compared TiDB 6.1.0, Aurora 3.02.2, CockroachDB 22.2, and TiDB 6.5.0 with an approximate cost on the cloud. The benchmark would measure the performance improvement ratio when creating a secondary index on a specific INT field for tables with different data volumes. The testing configurations are as follows:

Database	Version	Cluster configuration	Cost ($/hr)
TiDB	6.1.0/6.5.0	TiDB: (16c core + 32 GB) * 2 TiKV: (16c core + 64 GB + 1 TB) * 9	21.32
Aurora	3.02.2	db.r5.8xlarge * 2	21.00
CockroachDB	22.2	(16c core + 60 GB + 1 TB) * 6	21.17

When the benchmark runs online index creation with an idle workload, TiDB 6.5.0 demonstrates a performance 10x faster than its predecessor, TiDB 6.1.0. It is also 2~3x faster than CockroachDB 22.2 and approximately ~2.7x faster than Aurora 3.02.2.

Fig 2. Acceleration ratio of Fast Online DDL with idle workload.

When the benchmark runs online index creation with the sysbench OLTP_READ_WRITE scenario of 10K QPS, TiDB 6.5.0 outperforms TiDB 6.1.0 by 8~13x and CockroachDB 22.2 by approximately 3x. However, we did not run the same test on Aurora due to the potential interruption of database connections on associated Aurora Replicas when running DDL statements on the primary instance. Furthermore, Aurora users prefer to use tools such as gh-ost, pt-osc, or osc to perform the schema changes.

Fig 3. Acceleration ratio of Fast Online DDL with heavy workload.

The following chart shows how the performance varies with Fast Online DDL under different concurrency configurations. (The tidb_ddl_reorg_worker_cnt parameter was set to 1, 2, 4, 8, and 16 respectively.)

Fig 4. Acceleration ratio of Fast Online DDL on TiDB 6.5.0 vs. TiDB 6.1.0.

Conclusion

Through a combination of a new batch import mode, optimized data transmission, and parallel import, TiDB 6.5’s Fast Online DDL feature has significantly accelerated online DDL operations and made them more efficient.

You can get started with TiDB 6.5 LTS to experience Fast Online DDL or request a demo from our technical experts. Additionally, join our community on Slack as well as the TiDB Forum to share your thoughts and feedback.

Book a Demo