In this blog, we introduce TiDB Index Advisor—or TiAdvisor for short—a novel tool that automates index discovery to enhance user workload performance. We also outline the techniques employed in building this tool and provide use cases that validate its results.
Why Do We Need a TiDB Index Advisor?
Database indexes provide faster access to data. They are also crucial for database performance. However, selecting the appropriate indexes is a challenging task for database developers as it can consume days to weeks.
TiDB comes equipped with efficient processing of diverse workloads, seamlessly handling both online transaction processing (OLTP) and online analytical processing (OLAP) tasks. Row-based storage with TiKV in TiDB efficiently manages OLTP workloads. These workloads typically involve a small number of rows from tables. The performance of these workloads relies heavily on indexes in TiKV tables.
Manual index selection is a complex task because:
- The extra maintenance cost of indexes and the performance benefit they provide force tradeoffs.
- It requires expert knowledge of query optimization.
- It’s hard to apply to a workload and usually done for one query at a time.
- It usually takes days to weeks of trial and error to get the best overall solution.
With TiDB, we addressed these challenges with TiAdvisor. This tool considers whole workloads and then recommends indexes that help the overall performance in minutes.
Inside TiDB Index Advisor’s Architecture
There could be an exponential number of possible indexes for a given workload. For example, take a table of five columns with different orders and indexes. That means there are five possible single-column indexes, 20 possible two-column indexes, 60 possible three-column indexes, and 120 possible four- and five-column indexes. That means 325 different indexes can be created just for this table! It can be even greater if you consider multiple tables in the workload.
How Does TiAdvisor Work?
TiAdvisor first scans the SQL queries in the workload and collects columns used in predicates, group-by, and order-by clauses. Other columns are excluded, since indexing them does not help the performance of the SQL queries. With these candidate columns, TiAdvisor avoids going through the search space exhaustively. It applies a simple genetic algorithm that finds first single-column indexes and then picks those that help the workload overall. TiAdvisor applies a second round of search for two-column indexes and so on. It does this up to a certain threshold on number indexes or maximum size of individual indexes.
TiAdvisor then verifies if a certain index is useful or not by performing a “what if” scenario with the candidate index. This creates a hypothetical index (i.e., no materialization and just metadata). It then computes the cost of the whole workload with and without the index. The index joins the recommendation if the workload cost is lower with the index and ignored otherwise. An API with TiDB optimizer handles the cost computation of the workload.
The diagram below shows the architecture of TiAdvisor detailing the tool’s input and output. It also reveals how it uses TiDB to create hypothetical indexes and receive the cost of queries.
Figure 1. A diagram of TiAdvisor’s architecture.
The tuned workload becomes the input along with the metadata of the database: Table schemas and table statistics. The output of TiAdvisor is a list of DDL statements for creating the recommended indexes.
How to Use TiDB Index Advisor
TiAdvisor Beta is available to the public. You can install it using the below linux shell command:
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/pingcap/index_advisor/main/install.sh | sh
Users can try “index-advisor –help” to check how the tool can be used. Additionally, GitHub source code is available but not needed for running the tool.
TiAdvisor runs in two modes: Online (recommended) and Offline. The Online mode is used when TiAdvisor has access to the cluster that has the relevant workload data. This mode is useful for users who like to run TiAdvisor on their cluster and get an index recommendation.
Getting Started with an Example
Here’s an example of running TiAdvisor in Online mode:
index-advisor advise-online --query-schemas='tuner_db'
This example locates index recommendations for the TPC-H standard benchmark. Again, you can find the code on GitHub. The example requires that some sample TPC-H data loads under tuner_db. It also requires that all the 22 TPC-H workload queries get written in this directory
Here are some additional details about the options used in this example:
advise-onlinedenotes the Online option.
--query-schemasspecifies the database that has the schema of the workload queries.
--query-pathspecifies the directory for all the workload queries.
--dsnspecifies cluster information.
--outputlists where the tool output should go.
The DDL output of this example is DDL for five indexes, as shown in the below figure.
Figure 2. Index recommendation for TPC-H workload.
Users can implement Offline mode when they want to run the tool outside their production clusters. However, you first need to complete a prep step that exports and imports the workload metadata. This metadata includes database schema, table statistics, as well as importing the same information to another test cluster. An example of this mode can be seen in the following example.
For starters, we’ll export the workload metadata:
--dsnspecifies cluster information
--status_addressis the status address of the cluster, which is used to dump statistics.
--outputspecifies the directory to store workload metadata.
Finally, we’ll run TiAdvisor in offline mode using the exported metadata:
index-advisor advise-offline --dir-path=examples/offline/export
--dir-pathspecifies the workload metadata path.
--max-num-indexesspecifies the maximum number of indexes that TiAdvisor will recommend.
--outputlists where the tool output should go.
TiDB Index Advisor Success Stories
TiAdvisor went through extensive testing before releasing in Beta. We ran TiAdvisor on TPC-H. This improved the workload by 16% even though this benchmark is more OLAP centric, and indexes are not crucial to such environments.
We also ran TiAdvisor through the join-order-benchmark (JOB). This allowed us to improve performance by 46% with five recommended indexes. This improvement is more significant than TPC-H, as the JOB benchmark exhibits a greater diversity in query workload complexity. The below screenshot shows JOB query 6a with and without indexes as an example of the impact on TiAdvisor.
Figure 3. A JOB query example.
The original query plan does a full table scan on two large tables “mk” and “ci”. At the same time, the optimized query uses index access, reducing the query latency by more than 99%.
In addition to the above benchmarks, we also ran TiAdvisor in a TiDB Web3 customer environment that is sensitive to efficient use of indexes. The tool ran in minutes and produced comparable performance to a manual process that took a couple of weeks!
The figure below depicts the performance of the customer workload with no indexes, with user-defined indexes, and with TiAdvisor indexes. The TiAdvisor result is more succinct than the manual process, since the latter can potentially include redundant indexes.
Figure 4. Estimated total workload cost in seconds.
As demonstrated, TiAdvisor can tackle the intricate challenges faced by users in identifying optimal indexes to unlock their applications and enhance system performance. The tool is currently available in Beta for the general public. It has also been thoroughly tested with standard benchmarks and in customer production systems.
In the future, we welcome community feedback to refine and enrich TiAdvisor. Our roadmap highlights upcoming improvements, such as enhancing the tool’s search space and refining the cost model. Additionally, we intend to integrate TiAdvisor into TiUP package manager and TiDB Cloud, a fully-managed database-as-a-service (DBaaS) offering of TiDB.
A fully-managed cloud DBaaS for predictable workloads
A fully-managed cloud DBaaS for auto-scaling workloads