Online Data Definition Language (DDL) is a crucial feature for modern databases and a cornerstone of MySQL modernization strategies. It allows schema changes without significant downtime or locking that could disrupt database operations. This means these operations carry out while the database continues to be available for reads and writes, minimizing downtime and avoiding disruption to ongoing activities.
Online DDL is particularly important for applications requiring high availability and where maintenance windows are not feasible or are too disruptive. This capability is crucial for large, operational databases that need to scale, evolve, and undergo schema changes without affecting the user experience or service availability.
In this blog post, we’ll explore what online DDL is and then dive into how TiDB, a distributed SQL database, efficiently handles online DDL operations.
What Is Online DDL—and Why It Matters for Modern MySQL Systems
Online DDL is the ability to evolve a schema—add columns, create or drop indexes, change constraints—while the database stays read/write available. Instead of blocking tables, the system performs changes concurrently with live traffic, which preserves uptime, protects p95/p99 latency, and raises online DDL performance by avoiding long metadata locks that stall queries or applications.
Building on the Google F1 approach, online DDL is split into safe transition stages (e.g., prepare → backfill/validation → publish) so reads/writes continue as the change progresses. Data backfills run in the background with rate limits, new structures (like “shadow” indexes) are built alongside existing ones, and metadata is switched atomically only at the end—so nodes can lag by at most one DDL version without corruption or outages. The result is predictable performance during schema evolution, even on large tables under production load, especially with TiDB DDL optimizations that parallelize and throttle backfills safely.
From Locking to Non-Blocking: The Evolution of DDL Operations
- Non-blocking Operations: Avoids long-term locks on tables.
- Minimal Downtime: Keeps the database available during schema changes.
- Concurrency: Allows simultaneous DML (Data Manipulation Language) operations.
- Phased Approach: Performs changes in multiple stages to reduce impact.
Planning a MySQL migration? It helps to understand where native MySQL shines and where a distributed engine adds guarantees.
How MySQL Handles Online DDL: Strengths and Trade-offs
MySQL’s online DDL behavior is controlled primarily by the ALTER TABLE algorithm and LOCK settings:
- Algorithms:
- ALGORITHM=COPY rebuilds the table by making a copy. It’s the safest but most disruptive and slow.
- ALGORITHM=INPLACE changes structures alongside the live table (e.g., creates a “shadow” index, backfills in the background) and then switches metadata—no full table copy.
- ALGORITHM=INSTANT (MySQL 8.0+) updates only metadata for supported operations (for example, add a column at the end). There’s no data backfill or table rebuild, so it completes almost immediately.
- Concurrency (LOCK):
- LOCK=NONE keeps read/write traffic flowing during most of the operation, taking only brief metadata locks at start/commit.
- LOCK=SHARED permits reads but restricts writes at critical moments.
- LOCK=EXCLUSIVE blocks both reads and writes for the duration (typically paired with COPY).
Here’s what the phased flow (INPLACE) looks like:
- Prepare: Acquire a short metadata lock, record the planned change, and start background workers.
- Reorg (background): Build the new structure (e.g., index backfill) while DML continues; changes are tracked so new/updated rows are included.
- Commit: Take a brief metadata lock to atomically switch pointers to the new structure; release.
With ALGORITHM=INSTANT, supported alterations (notably ADD COLUMN at the end of the table, some secondary index metadata ops in newer 8.0) finish in milliseconds because only the data dictionary changes. There’s no table copy, no backfill, and the operation typically holds just a short metadata lock. Limitations apply (e.g., column must be appended; certain data type/position changes still require INPLACE or COPY).
Picking the right algorithm + lock level lets you evolve schemas with minimal latency impact and near-zero downtime.
Understanding Metadata Locks and Non-Blocking Schema Changes
MySQL’s InnoDB engine allows many schema changes without blocking access to the table. For instance, adding a column or creating an index is done without locking the entire table, as shown below.

- Metadata Update: MySQL updates the table’s metadata to include the new column.
- Background Processing: The InnoDB engine performs the addition of the new column in the background. Existing rows update incrementally to include the new column without blocking read or write operations.
- Concurrent Operations: While the background task is running, the table remains fully accessible for other operations. New data includes the new column, and existing data gradually updates.
Online Index Creation and Instant DDL
MySQL supports creating indexes concurrently, meaning the table remains available for reads and writes while the index builds.

- Metadata Preparation: MySQL updates the table metadata to include the index definition.
- Index Build Process: The InnoDB engine builds the index in the background. It reads existing table data and populates the index incrementally.
- Minimal Disruption: During the index build, the table remains accessible. Queries and updates proceed without being blocked, and the new index is used only after it is fully built and activated.
How TiDB Redefines Online DDL: Scalable, Distributed, and Instant
MySQL’s best-case online DDL can be excellent—especially with ALGORITHM=INSTANT or INPLACE—but it’s fundamentally single-node concurrency with brief metadata locks and table-local backfills. TiDB inherits the online benefits and then extends them to a distributed SQL setting: schema change is coordinated across many nodes, backfills are parallelized and throttled cluster-wide, and reads/writes keep flowing without hotspots or long metadata locks.
As mentioned earlier, TiDB implements a three-phase schema change protocol—Prepare → Reorganize → Commit—adapted for a multi-node cluster.
- Prepare: A DDL owner acquires a short metadata lease, records the intent, and installs new descriptors (e.g., shadow index/column) so DML can continue safely.
- Reorganize: Background workers backfill and validate in parallel across regions/stores while normal reads/writes proceed under MVCC; rate limits protect p95/p99 latency.
- Commit: The cluster performs an atomic metadata switch to activate the new structure, with only a brief, distributed metadata lock—nodes may lag by at most one DDL version and catch up without blocking traffic.
Instead of pausing tables, TiDB uses a cluster-elected DDL owner, metadata versioning, and MVCC to ensure writers/readers see consistent schemas while work progresses. The scheduler spreads backfill across leaders to avoid hot stores, and built-in throttles keep OLTP steady during heavy change.
What you get in practice:
- High availability: Continuous reads and writes during schema evolution, with only momentary metadata coordination.
- Scalability: Parallel, region-aware backfills and validation for large, distributed datasets.
- Operational flexibility: Predictable tail latency and no maintenance windows for common changes—so you can ship schema alongside features, not instead of them.
TiDB’s Three-Phase Schema Change Protocol Explained
TiDB employs a three-phase approach to handle online DDL operations:
1. Prepare Phase:
- TiDB updates metadata and prepares for the schema change.
- Ensures the change propagates safely across the cluster.
2. Reorganization Phase:
- The actual changes are applied in the background.
- For example, building a new index or adding a column incrementally.
- Allows concurrent DML operations, ensuring the database remains fully operational.
3. Commit Phase:
- The schema change is finalized and made active.
- Metadata updates to reflect the new schema, making changes visible.
Step-by-Step: How TiDB Executes Instant DDL Changes
In this section, we’ll explore some common examples of online DDL operations in TiDB:
- Metadata updates
- Background processing
- Final commit
Adding Columns with Zero Downtime
When adding a column in TiDB, the process avoids table locks and ensures continuous availability. Here’s how it works technically:
- Prepare Phase: TiDB updates the table’s metadata to include the new column definition. This change propagates across all nodes in the cluster to ensure consistency.
- Reorganization Phase: The new column is added in the background. TiDB uses a background task to backfill the column data for existing rows incrementally. During this phase, the table remains accessible for reads and writes, with new writes including the new column data.
- Commit Phase: Once the backfill is complete, TiDB finalizes the schema change. The metadata updates to reflect the new schema, making the new column fully active and visible for all subsequent operations.
This approach allows TiDB to add a column without long-term locking, ensuring minimal disruption to ongoing operations.

Creating Indexes Instantly in TiDB
Creating an index in TiDB is another common operation that benefits from online DDL. The process ensures that queries and updates continue smoothly:
- Prepare Phase: TiDB prepares for the index creation by updating metadata and ensuring the change distributes across the cluster.
- Reorganization Phase: The index is built in the background. TiDB processes existing data to populate the index incrementally, allowing read and write operations to proceed concurrently.
- Commit Phase: The new index is finalized and made active. The metadata updates, and the index becomes available for query optimization, enhancing performance without having caused downtime.

Modifying Columns Safely at Scale
Modifying a column, such as changing its type or attributes, is handled with similar efficiency:
- Prepare Phase: TiDB updates the table’s metadata to reflect the changes in the column’s definition.
- Reorganization Phase: The actual data transformation performs in the background. For example, changing a column from INT to BIGINT involves converting existing data incrementally, ensuring that the table remains accessible.
- Commit Phase: The schema change is finalized. The metadata updates to reflect the new column type, making the changes fully active.

Alternative Tools: gh-ost and pt-osc for Safer Schema Migrations
If you’re running MySQL (or MySQL-compatible) and can’t rely on native online DDL for a specific change, two battle-tested online schema change tools are gh-ost (GitHub’s Online Schema Transmogrifier) and pt-online-schema-change (Percona Toolkit). Both create a shadow table, backfill in the background, keep triggers or event hooks to mirror writes, then cut over with minimal interruption. They differ in how they hook in, how much they automate, and how they behave under load.
When to use which (at a glance):
- Use gh-ost when you want fewer triggers, better control via binlog stream, easier throttling/pauses, rich automation hooks (hooks, cutover flags), and safer behavior on very busy masters.
- Use pt-osc when you need a widely deployed, mature option that works across many MySQL versions, including older setups, and you’re comfortable with trigger-based mirroring and simple throttling.
Feature Comparison
| Capability | gh-ost | pt-online-schema-change |
|---|---|---|
| Backfill method | Consumes binlog to replicate changes into a shadow table; minimal reliance on triggers | Uses row-level triggers on the original table to mirror DML into a shadow table |
| Cutover behavior | Controlled, low-latency cutover with automatic retries; can be fully automated or manually approved | Quick rename cutover; typically brief metadata lock; manual guardrails available |
| Speed under load | Generally faster on busy masters (binlog-driven; less trigger overhead) | Good throughput; triggers can add overhead on very hot tables |
| Downtime window | Near-zero (milliseconds at rename); resilient to brief stalls | Near-zero (milliseconds at rename); depends on lock acquisition |
| Throttling & safety | Built-in smart throttling (replication lag, load avg, custom hooks); pause/resume friendly | Throttling available (replication lag, load), but less granular than gh-ost |
| Automation & CI/CD | Strong: hooks for pre/post checks, status APIs, dry-runs; easy to script pipelines | Solid: command-line driven; commonly scripted; fewer native hooks |
| Operational complexity | Moderate; requires binlog access and proper privileges | Lower barrier to entry; relies on triggers (which add their own considerations) |
| Best fit | Large/very active tables, need precise throttling and safer cutovers | Broad MySQL estates, older versions, teams comfortable with trigger-based mirroring |
Best Practices for Zero-Downtime Schema Changes
Zero-downtime DDL is not a single command. It’s a disciplined process that treats schema evolution like any other production release: design for concurrency, test with realistic load, monitor during rollout, and verify after cutover.
Here are some principles to anchor on:
- Design for concurrency: Assume live reads/writes continue. Favor algorithms/modes that avoid long metadata locks and use background backfills with throttling.
- Test before you DDL: Reproduce production scale (row counts, indexes, skew) in a staging environment; replay representative traffic to catch plan regressions and lock contention.
- Observe in real time: Track p95/p99 latency, error rates, replication lag, and backfill throughput; set abort thresholds so you can pause safely.
- Roll forward or roll back: Every migration should have a fast, documented rollback (or fall-forward) path.
Engineer Checklist
- Concurrency validation
- Confirm algorithm/mode supports online change (e.g., INSTANT/INPLACE or equivalent).
- Run load test with read/write mix; verify p95/p99 stay within SLO during backfill.
- Check for hot partitions/regions and leader skew before starting.
- Monitoring plan
- Dashboards for latency, errors, queue depth/backfill rate, replication lag.
- Alerts with hard abort thresholds (e.g., p99 > SLO + 10% for N minutes).
- Log sampling on affected services for timeouts/retries.
- Rollback strategy
- Pre-create revert script (rename back, drop shadow structures, or disable feature flag).
- Data safety confirmed (no destructive transforms without verified backup/snapshot).
- Communication plan: who can push the big red button and under what conditions.
- Automation & tooling
- Use a migration framework/runner with idempotency, dry-run, and resume support.
- Throttling controls (rate limits tied to latency/lag); pause/resume hooks.
- Pre-/post-checks: index coverage, query plan diff, metadata lock timing.
- Post-change verification
- Compare query plans before/after; watch for regressions on hot paths.
- Validate new structures are used (e.g., index adoption) and backfills completed.
- Update runbooks, dashboards, and capacity models with lessons learned.
Challenges and Considerations
While TiDB’s online DDL offers numerous advantages, it’s important to be aware of potential challenges and considerations. Understanding these can help in planning and executing schema changes more effectively. In this section, we’ll discuss the resource usage implications, the complexity of certain schema changes, and the importance of version compatibility when using online DDL in TiDB.
- Resource Usage: Online DDL operations can be resource-intensive. Monitoring and managing resources is crucial to prevent performance issues.
- Complex Schema Changes: Some changes might require careful planning and execution.
- Version Compatibility: Ensure the TiDB version supports the specific online DDL operations needed.
Conclusion: Building Resilient MySQL Workloads with TiDB
Online DDL is no longer optional—it’s how modern teams evolve schemas without sacrificing uptime, performance, or velocity. TiDB’s three-phase protocol (Prepare → Reorganize → Commit), distributed concurrency control, and background backfills keep reads and writes flowing during change, so large MySQL-compatible estates can scale safely and predictably.
By leaning on TiDB’s online DDL, high availability, and elastic scale, organizations gain operational flexibility (ship schema during business hours), performance stability (p95/p99 stay within SLOs), and a cleaner path to growth (no maintenance windows, fewer fire drills). The result is a resilient data layer that supports fast-moving applications across regions, tenants, and workloads.
Want to enhance the efficiency and reliability of your database operations? Start a free TiDB Cloud cluster to validate online DDL, rolling changes, and performance under load.
Experience modern data infrastructure firsthand.
TiDB Cloud Dedicated
A fully-managed cloud DBaaS for predictable workloads
TiDB Cloud Starter
A fully-managed cloud DBaaS for auto-scaling workloads