What companies are using TiDB in production?

TiDB is trusted by over 3000 global enterprises across a variety of industries, such as financial services, gaming, and e-commerce. Users include Square (US), Shopee (Singapore), and China UnionPay (China).

How is TiDB different from other relational databases like MySQL?

TiDB is a next-generation, distributed relational database that can independently scale both computing and storage capacity by adding new nodes. Unlike traditional relational databases that only scale vertically, TiDB offers horizontal scalability, high availability with automatic failover, HTAP capabilities for both OLTP and OLAP workloads, and MySQL protocol compatibility so you can replace MySQL without changing application code.

What is the relationship between TiDB and TiDB Cloud?

TiDB is an open-source database best suited for organizations that want to run it on-premises or in their own data centers. TiDB Cloud is a fully managed cloud Database-as-a-Service (DBaaS) built on TiDB, with an easy-to-use web-based management console for managing TiDB clusters in mission-critical production environments.

Is TiDB compatible with MySQL?

TiDB is highly compatible with the MySQL protocol and the common features and syntax of MySQL 5.7 and MySQL 8.0. Ecosystem tools for MySQL such as PHPMyAdmin, Navicat, MySQL Workbench, and DBeaver can all be used with TiDB. Some MySQL features are not supported in TiDB due to architectural differences in a distributed system.

What programming languages can I use to work with TiDB?

You can use any programming language supported by the MySQL client or driver, including Java, Go, Python, Ruby, PHP, and more.

How does TiDB support strong consistency?

TiDB implements Snapshot Isolation consistency, delivering REPEATABLE-READ for MySQL compatibility. Data is redundantly copied between TiKV nodes using the Raft consensus algorithm to ensure recoverability in the event of node failure. TiDB uses a replication log and State Machine model — write requests go to a Leader node which replicates the command to Followers as a log, and once the majority of nodes receive the log, it is committed and applied.

Where can I run TiDB?

TiDB is available for bare metal, cloud-based, or hybrid installations. A Kubernetes Operator is available, and you can also use TiUp to quickly deploy a test environment on your laptop or a full production cluster across many nodes.

How does TiDB ensure high availability?

TiDB uses the Raft consensus algorithm to ensure data is highly available and safely replicated throughout storage in Raft Groups. Data is redundantly copied between TiKV nodes across different Availability Zones to protect against machine or data center failure. Automatic failover ensures your service stays online continuously.

What support is available for TiDB customers?

TiDB is supported by a team with experience running mission-critical use cases for over 3000 global enterprises across financial services, e-commerce, enterprise applications, and gaming. 24/7 support is available for TiDB Enterprise Subscription users.

What are PD, TiDB, TiKV, and TiFlash nodes in a TiDB Cluster?

PD (Placement Driver) is the brain of the TiDB cluster, storing metadata and sending data scheduling commands to TiKV nodes. TiDB is the SQL computing layer that aggregates query results and is horizontally scalable. TiKV is the transactional store for OLTP data, maintained in multiple replicas with native high availability. TiFlash is the analytical storage layer that replicates data from TiKV in real-time to support OLAP workloads using columnar storage.

How does TiDB replicate data between TiKV nodes?

TiKV divides the key-value space into key ranges called Regions. Data is distributed across all nodes using Regions as the basic unit, with PD responsible for spreading Regions evenly. TiDB uses the Raft consensus algorithm to replicate data by Regions — multiple replicas of a Region form a Raft Group, and each data change is recorded as a Raft log that is reliably replicated across nodes.

How do I make use of TiDB HTAP capabilities?

As a Hybrid Transactional Analytical Processing (HTAP) database, TiDB automatically replicates data between the OLTP store (TiKV) and OLAP store (TiFlash) in real-time. This eliminates the need for a separate data warehouse and supports real-time analytics on transactional data. Typical HTAP use cases include user personalization, AI recommendations, fraud detection, business intelligence, and real-time reporting.

Is there an easy migration path from another RDBMS to TiDB?

Yes. TiDB provides TiDB Lightning and a Data Migration Tool to migrate data from MySQL databases. Since TiDB implements the MySQL wire protocol, you can use the MySQL client directly. TiKV APIs are also available for Java, Go, Rust, and Python.

What is the difference between TiDB Community Edition and the Enterprise Subscription?

Some features such as audit logging are not included in the Community Edition. The most significant difference is the inclusion of Enterprise Support at the Enterprise Subscription level, providing 24/7 professional support for production environments.

How does TiDB protect data privacy and ensure security?

TiDB includes Transport Layer Security (TLS) and Transparent Data Encryption (TDE) for encryption at rest. It operates across two network planes: one for application-to-TiDB server communication and one for internal data communication. TiDB also supports extended syntax for Subject Alternative Name verification and TLS context for internal communication.

What companies are using TiDB Cloud in production?

TiDB Cloud is trusted by enterprises including Catalyst (US), KNN3 Network (Singapore), and CAPCOM (Japan), alongside thousands of other global organizations across financial services, SaaS, Web3, gaming, and e-commerce.

TiDB Cloud is a fully managed cloud Database-as-a-Service (DBaaS) built on TiDB. It allows developers and DBAs to deploy on Amazon Web Services or Google Cloud through an intuitive console, handling infrastructure management and cluster deployment so teams can focus on building applications. Clusters can be scaled in or out with a simple click.

Is TiDB Cloud compatible with MySQL?

TiDB Cloud is highly compatible with the MySQL protocol and the common features and syntax of MySQL 5.7 and MySQL 8.0. MySQL ecosystem tools including PHPMyAdmin, Navicat, MySQL Workbench, and DBeaver can all be used with TiDB Cloud.

Where can I run TiDB Cloud?

TiDB Cloud is currently available on Amazon Web Services (AWS) and Google Cloud.

How does TiDB Cloud ensure high availability?

TiDB Cloud uses the Raft consensus algorithm to replicate data safely across TiKV nodes in different Availability Zones, protecting against machine or data center failure. As a SaaS provider, PingCAP meets SOC 2 Type 2, ISO 27001, ISO 27701, PCI DSS, GDPR, and HIPAA standards to ensure data security, availability, and confidentiality.

What support is available for TiDB Cloud customers?

TiDB Cloud is supported by the same team behind TiDB, with experience running mission-critical workloads for over 3000 global enterprises. 24/7 support is available for all TiDB Cloud users.

How do I make use of TiDB Cloud HTAP capabilities?

TiDB Cloud automatically replicates data between the OLTP store (TiKV) and OLAP store (TiFlash) in real-time, enabling real-time analytics on transactional data without a separate data pipeline. Typical use cases include AI recommendations, fraud detection, business intelligence, and real-time reporting.

Is there an easy migration path from another RDBMS to TiDB Cloud?

Yes. TiDB provides TiDB Lightning and a Data Migration Tool for migrating from MySQL. TiDB Cloud implements the MySQL wire protocol so existing MySQL clients work directly. TiKV APIs are also available for Java, Go, Rust, and Python.

MySQL Online DDL: Zero-Downtime Schema Changes with TiDB

Online Data Definition Language (DDL) is a crucial feature for modern databases and a cornerstone of MySQL modernization strategies. It allows schema changes without significant downtime or locking that could disrupt database operations. This means these operations carry out while the database continues to be available for reads and writes, minimizing downtime and avoiding disruption to ongoing activities.

Online DDL is particularly important for applications requiring high availability and where maintenance windows are not feasible or are too disruptive. This capability is crucial for large, operational databases that need to scale, evolve, and undergo schema changes without affecting the user experience or service availability.

In this blog post, we’ll explore what online DDL is and then dive into how TiDB, a distributed SQL database, efficiently handles online DDL operations.

What Is Online DDL—and Why It Matters for Modern MySQL Systems

Online DDL is the ability to evolve a schema—add columns, create or drop indexes, change constraints—while the database stays read/write available. Instead of blocking tables, the system performs changes concurrently with live traffic, which preserves uptime, protects p95/p99 latency, and raises online DDL performance by avoiding long metadata locks that stall queries or applications.

Building on the Google F1 approach, online DDL is split into safe transition stages (e.g., prepare → backfill/validation → publish) so reads/writes continue as the change progresses. Data backfills run in the background with rate limits, new structures (like “shadow” indexes) are built alongside existing ones, and metadata is switched atomically only at the end—so nodes can lag by at most one DDL version without corruption or outages. The result is predictable performance during schema evolution, even on large tables under production load, especially with TiDB DDL optimizations that parallelize and throttle backfills safely.

From Locking to Non-Blocking: The Evolution of DDL Operations

Non-blocking Operations: Avoids long-term locks on tables.
Minimal Downtime: Keeps the database available during schema changes.
Concurrency: Allows simultaneous DML (Data Manipulation Language) operations.
Phased Approach: Performs changes in multiple stages to reduce impact.

Planning a MySQL migration? It helps to understand where native MySQL shines and where a distributed engine adds guarantees.

How MySQL Handles Online DDL: Strengths and Trade-offs

MySQL’s online DDL behavior is controlled primarily by the ALTER TABLE algorithm and LOCK settings:

Algorithms:
- ALGORITHM=COPY rebuilds the table by making a copy. It’s the safest but most disruptive and slow.
- ALGORITHM=INPLACE changes structures alongside the live table (e.g., creates a “shadow” index, backfills in the background) and then switches metadata—no full table copy.
- ALGORITHM=INSTANT (MySQL 8.0+) updates only metadata for supported operations (for example, add a column at the end). There’s no data backfill or table rebuild, so it completes almost immediately.
Concurrency (LOCK):
- LOCK=NONE keeps read/write traffic flowing during most of the operation, taking only brief metadata locks at start/commit.
- LOCK=SHARED permits reads but restricts writes at critical moments.
- LOCK=EXCLUSIVE blocks both reads and writes for the duration (typically paired with COPY).

Here’s what the phased flow (INPLACE) looks like:

Prepare: Acquire a short metadata lock, record the planned change, and start background workers.
Reorg (background): Build the new structure (e.g., index backfill) while DML continues; changes are tracked so new/updated rows are included.
Commit: Take a brief metadata lock to atomically switch pointers to the new structure; release.

With ALGORITHM=INSTANT, supported alterations (notably ADD COLUMN at the end of the table, some secondary index metadata ops in newer 8.0) finish in milliseconds because only the data dictionary changes. There’s no table copy, no backfill, and the operation typically holds just a short metadata lock. Limitations apply (e.g., column must be appended; certain data type/position changes still require INPLACE or COPY).

Picking the right algorithm + lock level lets you evolve schemas with minimal latency impact and near-zero downtime.

Understanding Metadata Locks and Non-Blocking Schema Changes

MySQL’s InnoDB engine allows many schema changes without blocking access to the table. For instance, adding a column or creating an index is done without locking the entire table, as shown below.

Non-blocking schema changes with online DLL in MySQL.

Metadata Update: MySQL updates the table’s metadata to include the new column.
Background Processing: The InnoDB engine performs the addition of the new column in the background. Existing rows update incrementally to include the new column without blocking read or write operations.
Concurrent Operations: While the background task is running, the table remains fully accessible for other operations. New data includes the new column, and existing data gradually updates.

Online Index Creation and Instant DDL

MySQL supports creating indexes concurrently, meaning the table remains available for reads and writes while the index builds.

Online index creation with online ddl in MySQL.

Metadata Preparation: MySQL updates the table metadata to include the index definition.
Index Build Process: The InnoDB engine builds the index in the background. It reads existing table data and populates the index incrementally.
Minimal Disruption: During the index build, the table remains accessible. Queries and updates proceed without being blocked, and the new index is used only after it is fully built and activated.

How TiDB Redefines Online DDL: Scalable, Distributed, and Instant

MySQL’s best-case online DDL can be excellent—especially with ALGORITHM=INSTANT or INPLACE—but it’s fundamentally single-node concurrency with brief metadata locks and table-local backfills. TiDB inherits the online benefits and then extends them to a distributed SQL setting: schema change is coordinated across many nodes, backfills are parallelized and throttled cluster-wide, and reads/writes keep flowing without hotspots or long metadata locks.

As mentioned earlier, TiDB implements a three-phase schema change protocol—Prepare → Reorganize → Commit—adapted for a multi-node cluster.

Prepare: A DDL owner acquires a short metadata lease, records the intent, and installs new descriptors (e.g., shadow index/column) so DML can continue safely.
Reorganize: Background workers backfill and validate in parallel across regions/stores while normal reads/writes proceed under MVCC; rate limits protect p95/p99 latency.
Commit: The cluster performs an atomic metadata switch to activate the new structure, with only a brief, distributed metadata lock—nodes may lag by at most one DDL version and catch up without blocking traffic.

Instead of pausing tables, TiDB uses a cluster-elected DDL owner, metadata versioning, and MVCC to ensure writers/readers see consistent schemas while work progresses. The scheduler spreads backfill across leaders to avoid hot stores, and built-in throttles keep OLTP steady during heavy change.

What you get in practice:

High availability: Continuous reads and writes during schema evolution, with only momentary metadata coordination.
Scalability: Parallel, region-aware backfills and validation for large, distributed datasets.
Operational flexibility: Predictable tail latency and no maintenance windows for common changes—so you can ship schema alongside features, not instead of them.

TiDB’s Three-Phase Schema Change Protocol Explained

TiDB employs a three-phase approach to handle online DDL operations:

1. Prepare Phase:

TiDB updates metadata and prepares for the schema change.
Ensures the change propagates safely across the cluster.

2. Reorganization Phase:

The actual changes are applied in the background.
For example, building a new index or adding a column incrementally.
Allows concurrent DML operations, ensuring the database remains fully operational.

3. Commit Phase:

The schema change is finalized and made active.
Metadata updates to reflect the new schema, making changes visible.

Step-by-Step: How TiDB Executes Instant DDL Changes

In this section, we’ll explore some common examples of online DDL operations in TiDB:

Metadata updates
Background processing
Final commit

Adding Columns with Zero Downtime

When adding a column in TiDB, the process avoids table locks and ensures continuous availability. Here’s how it works technically:

Prepare Phase: TiDB updates the table’s metadata to include the new column definition. This change propagates across all nodes in the cluster to ensure consistency.
Reorganization Phase: The new column is added in the background. TiDB uses a background task to backfill the column data for existing rows incrementally. During this phase, the table remains accessible for reads and writes, with new writes including the new column data.
Commit Phase: Once the backfill is complete, TiDB finalizes the schema change. The metadata updates to reflect the new schema, making the new column fully active and visible for all subsequent operations.

This approach allows TiDB to add a column without long-term locking, ensuring minimal disruption to ongoing operations.

Adding a column in TiDB with online ddl.

Creating Indexes Instantly in TiDB

Creating an index in TiDB is another common operation that benefits from online DDL. The process ensures that queries and updates continue smoothly:

Prepare Phase: TiDB prepares for the index creation by updating metadata and ensuring the change distributes across the cluster.
Reorganization Phase: The index is built in the background. TiDB processes existing data to populate the index incrementally, allowing read and write operations to proceed concurrently.
Commit Phase: The new index is finalized and made active. The metadata updates, and the index becomes available for query optimization, enhancing performance without having caused downtime.

Modifying Columns Safely at Scale

Modifying a column, such as changing its type or attributes, is handled with similar efficiency:

Prepare Phase: TiDB updates the table’s metadata to reflect the changes in the column’s definition.
Reorganization Phase: The actual data transformation performs in the background. For example, changing a column from INT to BIGINT involves converting existing data incrementally, ensuring that the table remains accessible.
Commit Phase: The schema change is finalized. The metadata updates to reflect the new column type, making the changes fully active.

Alternative Tools: gh-ost and pt-osc for Safer Schema Migrations

If you’re running MySQL (or MySQL-compatible) and can’t rely on native online DDL for a specific change, two battle-tested online schema change tools are gh-ost (GitHub’s Online Schema Transmogrifier) and pt-online-schema-change (Percona Toolkit). Both create a shadow table, backfill in the background, keep triggers or event hooks to mirror writes, then cut over with minimal interruption. They differ in how they hook in, how much they automate, and how they behave under load.

When to use which (at a glance):

Use gh-ost when you want fewer triggers, better control via binlog stream, easier throttling/pauses, rich automation hooks (hooks, cutover flags), and safer behavior on very busy masters.
Use pt-osc when you need a widely deployed, mature option that works across many MySQL versions, including older setups, and you’re comfortable with trigger-based mirroring and simple throttling.

Feature Comparison

Capability	gh-ost	pt-online-schema-change
Backfill method	Consumes binlog to replicate changes into a shadow table; minimal reliance on triggers	Uses row-level triggers on the original table to mirror DML into a shadow table
Cutover behavior	Controlled, low-latency cutover with automatic retries; can be fully automated or manually approved	Quick rename cutover; typically brief metadata lock; manual guardrails available
Speed under load	Generally faster on busy masters (binlog-driven; less trigger overhead)	Good throughput; triggers can add overhead on very hot tables
Downtime window	Near-zero (milliseconds at rename); resilient to brief stalls	Near-zero (milliseconds at rename); depends on lock acquisition
Throttling & safety	Built-in smart throttling (replication lag, load avg, custom hooks); pause/resume friendly	Throttling available (replication lag, load), but less granular than gh-ost
Automation & CI/CD	Strong: hooks for pre/post checks, status APIs, dry-runs; easy to script pipelines	Solid: command-line driven; commonly scripted; fewer native hooks
Operational complexity	Moderate; requires binlog access and proper privileges	Lower barrier to entry; relies on triggers (which add their own considerations)
Best fit	Large/very active tables, need precise throttling and safer cutovers	Broad MySQL estates, older versions, teams comfortable with trigger-based mirroring

Best Practices for Zero-Downtime Schema Changes

Zero-downtime DDL is not a single command. It’s a disciplined process that treats schema evolution like any other production release: design for concurrency, test with realistic load, monitor during rollout, and verify after cutover.

Here are some principles to anchor on:

Design for concurrency: Assume live reads/writes continue. Favor algorithms/modes that avoid long metadata locks and use background backfills with throttling.
Test before you DDL: Reproduce production scale (row counts, indexes, skew) in a staging environment; replay representative traffic to catch plan regressions and lock contention.
Observe in real time: Track p95/p99 latency, error rates, replication lag, and backfill throughput; set abort thresholds so you can pause safely.
Roll forward or roll back: Every migration should have a fast, documented rollback (or fall-forward) path.

Engineer Checklist

Concurrency validation
- Confirm algorithm/mode supports online change (e.g., INSTANT/INPLACE or equivalent).
- Run load test with read/write mix; verify p95/p99 stay within SLO during backfill.
- Check for hot partitions/regions and leader skew before starting.
Monitoring plan
- Dashboards for latency, errors, queue depth/backfill rate, replication lag.
- Alerts with hard abort thresholds (e.g., p99 > SLO + 10% for N minutes).
- Log sampling on affected services for timeouts/retries.
Rollback strategy
- Pre-create revert script (rename back, drop shadow structures, or disable feature flag).
- Data safety confirmed (no destructive transforms without verified backup/snapshot).
- Communication plan: who can push the big red button and under what conditions.
Automation & tooling
- Use a migration framework/runner with idempotency, dry-run, and resume support.
- Throttling controls (rate limits tied to latency/lag); pause/resume hooks.
- Pre-/post-checks: index coverage, query plan diff, metadata lock timing.
Post-change verification
- Compare query plans before/after; watch for regressions on hot paths.
- Validate new structures are used (e.g., index adoption) and backfills completed.
- Update runbooks, dashboards, and capacity models with lessons learned.

Challenges and Considerations

While TiDB’s online DDL offers numerous advantages, it’s important to be aware of potential challenges and considerations. Understanding these can help in planning and executing schema changes more effectively. In this section, we’ll discuss the resource usage implications, the complexity of certain schema changes, and the importance of version compatibility when using online DDL in TiDB.

Resource Usage: Online DDL operations can be resource-intensive. Monitoring and managing resources is crucial to prevent performance issues.
Complex Schema Changes: Some changes might require careful planning and execution.
Version Compatibility: Ensure the TiDB version supports the specific online DDL operations needed.

Conclusion: Building Resilient MySQL Workloads with TiDB

Online DDL is no longer optional—it’s how modern teams evolve schemas without sacrificing uptime, performance, or velocity. TiDB’s three-phase protocol (Prepare → Reorganize → Commit), distributed concurrency control, and background backfills keep reads and writes flowing during change, so large MySQL-compatible estates can scale safely and predictably.

By leaning on TiDB’s online DDL, high availability, and elastic scale, organizations gain operational flexibility (ship schema during business hours), performance stability (p95/p99 stay within SLOs), and a cleaner path to growth (no maintenance windows, fewer fire drills). The result is a resilient data layer that supports fast-moving applications across regions, tenants, and workloads.

Want to enhance the efficiency and reliability of your database operations? Start a free TiDB Cloud cluster to validate online DDL, rolling changes, and performance under load.

Try TiDB Cloud