PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Consultation
PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Consultation

Contact

中文
Documentation
  • About TiDB
    • TiDB Introduction
    • TiDB Architecture
  • Quick Start
    • TiDB Quick Start Guide
    • Basic SQL Statements
    • Bikeshare Example Database
  • TiDB User Guide
    • TiDB Server Administration
      • The TiDB Server
      • The TiDB Command Options
      • The TiDB Data Directory
      • The TiDB System Database
      • The TiDB System Variables
      • The TiDB Specific System Variables
      • The TiDB Server Logs
      • The TiDB Access Privilege System
      • TiDB User Account Management
      • Use Encrypted Connections
    • SQL Optimization and Execution
      • SQL Optimization Process
      • Understand the Query Execution Plan
      • Introduction to Statistics
    • Language Structure
      • Literal Values
      • Schema Object Names
      • Keywords and Reserved Words
      • User-Defined Variables
      • Expression Syntax
      • Comment Syntax
    • Globalization
      • Character Set Support
      • Character Set Configuration
      • Time Zone Support
    • Data Types
      • Numeric Types
      • Date and Time Types
      • String Types
      • JSON Types
      • The ENUM data type
      • The SET Type
      • Data Type Default Values
    • Functions and Operators
      • Function and Operator Reference
      • Type Conversion in Expression Evaluation
      • Operators
      • Control Flow Functions
      • String Functions
      • Numeric Functions and Operators
      • Date and Time Functions
      • Bit Functions and Operators
      • Cast Functions and Operators
      • Encryption and Compression Functions
      • Information Functions
      • JSON Functions
      • Aggregate (GROUP BY) Functions
      • Miscellaneous Functions
      • Precision Math
    • SQL Statement Syntax
      • Data Definition Statements
      • Data Manipulation Statements
      • Transactions
      • Database Administration Statements
      • Prepared SQL Statement Syntax
      • Utility Statements
      • TiDB SQL Syntax Diagram
    • Generated Columns
    • Connectors and APIs
    • TiDB Transaction Isolation Levels
    • Error Codes and Troubleshooting
    • Compatibility with MySQL
    • TiDB Memory Control
    • Slow Query Log
    • Advanced Usage
      • Read Data From History Versions
      • Garbage Collection (GC)
  • TiDB Operations Guide
    • Hardware and Software Requirements
    • Deploy
      • Ansible Deployment (Recommended)
      • Offline Deployment Using Ansible
      • Docker Deployment
      • Docker Compose Deployment
      • Cross-DC Deployment Solutions
      • Kubernetes Deployment
    • Configure
      • Configuration Flags
      • Configuration File Description
      • Modify Component Configuration Using Ansible
      • Enable TLS Authentication
      • Generate Self-signed Certificates
      • Cluster Topology Configuration
    • Monitor
      • Monitoring Framework Overview
      • Key Monitoring Metrics
        • Overview
        • TiDB
        • PD
        • TiKV
      • Monitor a TiDB Cluster
    • Scale
      • Scale a TiDB Cluster
      • Scale Using Ansible
    • Upgrade
      • Upgrade the Component Version
      • TiDB 2.0 Upgrade Guide
      • TiDB 2.1 Upgrade Guide
    • Tune Performance
    • Backup and Migrate
      • Backup and Restore
      • Migrate
        • Migration Overview
        • Migrate All the Data
        • Migrate the Data Incrementally
    • TiDB-Ansible Common Operations
    • Troubleshoot
  • TiDB Enterprise Tools
    • Syncer
    • mydumper
    • Loader
    • Data Migration
      • Overview
      • Restrictions
      • Deploy
      • Features
        • Table Routing
        • Black and White Lists
        • Binlog Event Filtering
        • Column Mapping
        • Synchronization Delay Monitoring
        • Sharding Support
          • Introduction
          • Restrictions
          • Handle Sharding DDL Locks manually
      • Usage Scenarios
        • Simple Scenario
        • Shard Merge Scenario
      • Configure
        • Overview
        • Task Configuration
      • Monitor
      • Manage the Task
      • Cluster Operations
      • Troubleshoot
    • TiDB-Lightning
      • Overview
      • Deployment
      • Checkpoints
      • Table Filter
      • Monitor
      • Troubleshooting
      • FAQs
    • TiDB-Binlog
    • PD Control
    • PD Recover
    • TiKV Control
    • TiDB Controller
  • TiKV Documentation
  • TiSpark Documentation
    • Quick Start Guide
    • User Guide
  • Frequently Asked Questions (FAQ)
  • TiDB Best Practices
  • Releases
    • 2.1.4
    • 2.1.3
    • 3.0 Beta
    • 2.0.11
    • 2.1.2
    • 2.0.10
    • 2.1.1
    • 2.1 GA
    • 2.0.9
    • 2.1 RC5
    • 2.1 RC4
    • 2.0.8
    • 2.1 RC3
    • 2.1 RC2
    • 2.0.7
    • 2.1 RC1
    • 2.0.6
    • 2.0.5
    • 2.1 Beta
    • 2.0.4
    • 2.0.3
    • 2.0.2
    • 2.0.1
    • 2.0
    • 2.0 RC5
    • 2.0 RC4
    • 2.0 RC3
    • 2.0 RC1
    • 1.1 Beta
    • 1.0.8
    • 1.0.7
    • 1.1 Alpha
    • 1.0.6
    • 1.0.5
    • 1.0.4
    • 1.0.3
    • 1.0.2
    • 1.0.1
    • 1.0
    • Pre-GA
    • RC4
    • RC3
    • RC2
    • RC1
  • TiDB Adopters
  • TiDB Roadmap
  • Connect with us
  • More Resources
    • PingCAP Blog
    • Weekly Update

Data Migration Shard Merge Scenario

This document shows how to use Data Migration (DM) in the shard merge scenario where the sharded schemas and sharded tables data of three upstream MySQL instances need to be synchronized to a downstream TiDB cluster.

Upstream instances

Assume that the upstream schemas are as follows:

  • Instance 1

    Schema Tables
    user information, log_north, log_bak
    store_01 sale_01, sale_02
    store_02 sale_01, sale_02
  • Instance 2

    Schema Tables
    user information, log_east, log_bak
    store_01 sale_01, sale_02
    store_02 sale_01, sale_02
  • Instance 3

    Schema Tables
    user information, log_south, log_bak
    store_01 sale_01, sale_02
    store_02 sale_01, sale_02

Synchronization requirements

  1. Merge the user.information table of three upstream instances to the downstream user.information table in TiDB.
  2. Merge the user.log_{north|south|east} table of three upstream instances to the downstream user.log_{north|south|east} table in TiDB.
  3. Merge the store_{01|02}.sale_{01|02} table of three upstream instances to the downstream store.sale table in TiDB.
  4. Filter out all the deletion operations in the user.log_{north|south|east} table of three upstream instances.
  5. Filter out all the deletion operations in the user.information table of three upstream instances.
  6. Filter out all the deletion operations in the store_{01|02}.sale_{01|02} table of three upstream instances.
  7. Filter out the user.log_bak table of three upstream instances.
  8. Because the store_{01|02}.sale_{01|02} tables have auto-increment primary keys of the bigint type, the conflict occurs when these tables are merged into TiDB. So you need to modify the auto-increment primary keys to avoid the conflict.

Downstream instances

Assume that the downstream schema after synchronization is as follows:

Schema Tables
user information, log_north, log_east, log_south
store sale

Synchronization solution

  • To satisfy the synchronization Requirements #1 and #2, configure the table routing rule as follows:

    routes:
      ...
      user-route-rule:
        schema-pattern: "user"
        target-schema: "user"
  • To satisfy the synchronization Requirement #3, configure the table routing rule as follows:

    routes:
      ...
      store-route-rule:
        schema-pattern: "store_*"
        target-schema: "store"
      sale-route-rule:
        schema-pattern: "store_*"
        table-pattern: "sale_*"
        target-schema: "store"
        target-table:  "sale"
  • To satisfy the synchronization Requirements #4 and #5, configure the binlog event filtering rule as follows:

    filters:
      ...
      user-filter-rule:
        schema-pattern: "user"
        events: ["truncate table", "drop table", "delete", "drop database"]
        action: Ignore

    Note: The synchronization Requirements #4, #5 and #7 indicate that all the deletion operations in the user schema are filtered out, so a schema level filtering rule is configured here. However, the deletion operations of future tables in the user schema will also be filtered out.

  • To satisfy the synchronization Requirement #6, configure the binlog event filtering rule as follows:

    filters:
      ...
      sale-filter-rule:
        schema-pattern: "store_*"
        table-pattern: "sale_*"
        events: ["truncate table", "drop table", "delete"]
        action: Ignore
      store-filter-rule:
        schema-pattern: "store_*"
        events: ["drop database"]
        action: Ignore
  • To satisfy the synchronization Requirement #7, configure the black and white table lists as follows:

    black-white-list:
      log-bak-ignored:
        ignore-tales:
        - db-name: "user"
          tbl-name: "log_bak"
  • To satisfy the synchronization Requirement #8, configure the column mapping rule as follows:

    column-mappings:
      instance-1-sale:
        schema-pattern: "store_*"
        table-pattern: "sale_*"
        expression: "partition id"
        source-column: "id"
        target-column: "id"
        arguments: ["1", "store_", "sale_"]
      instance-2-sale:
        schema-pattern: "store_*"
        table-pattern: "sale_*"
        expression: "partition id"
        source-column: "id"
        target-column: "id"
        arguments: ["2", "store_", "sale_"]
      instance-3-sale:
        schema-pattern: "store_*"
        table-pattern: "sale_*"
        expression: "partition id"
        source-column: "id"
        target-column: "id"
        arguments: ["3", "store_", "sale_"]

Synchronization task configuration

The complete configuration of the synchronization task is shown as below. For more details, see Data Migration Task Configuration File.

name: "shard_merge"
task-mode: all
meta-schema: "dm_meta"
remove-meta: false

target-database:
  host: "192.168.0.1"
  port: 4000
  user: "root"
  password: ""

mysql-instances:
  -
    source-id: "instance-1"
    route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
    filter-rules: ["user-filter-rule", "store-filter-rule" , "sale-filter-rule"]
    column-mapping-rules: ["instance-1-sale"]
    black-white-list:  "log-bak-ignored"
    mydumper-config-name: "global"
    loader-config-name: "global"
    syncer-config-name: "global"

  -
    source-id: "instance-2"
    route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
    filter-rules: ["user-filter-rule", "store-filter-rule" , "sale-filter-rule"]
    column-mapping-rules: ["instance-2-sale"]
    black-white-list:  "log-bak-ignored"
    mydumper-config-name: "global"
    loader-config-name: "global"
    syncer-config-name: "global"
  -
    source-id: "instance-3"
    route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
    filter-rules: ["user-filter-rule", "store-filter-rule" , "sale-filter-rule"]
    column-mapping-rules: ["instance-3-sale"]
    black-white-list:  "log-bak-ignored"
    mydumper-config-name: "global"
    loader-config-name: "global"
    syncer-config-name: "global"

# Other common configs shared by all instances.

routes:
  user-route-rule:
    schema-pattern: "user"
    target-schema: "user"
  store-route-rule:
    schema-pattern: "store_*"
    target-schema: "store"
  sale-route-rule:
    schema-pattern: "store_*"
    table-pattern: "sale_*"
    target-schema: "store"
    target-table:  "sale"

filters:
  user-filter-rule:
    schema-pattern: "user"
    events: ["truncate table", "drop table", "delete", "drop database"]
    action: Ignore
  sale-filter-rule:
    schema-pattern: "store_*"
    table-pattern: "sale_*"
    events: ["truncate table", "drop table", "delete"]
    action: Ignore
  store-filter-rule:
    schema-pattern: "store_*"
    events: ["drop database"]
    action: Ignore

black-white-list:
  log-bak-ignored:
    ignore-tales:
    - db-name: "user"
      tbl-name: "log_bak"

column-mappings:
  instance-1-sale:
    schema-pattern: "store_*"
    table-pattern: "sale_*"
    expression: "partition id"
    source-column: "id"
    target-column: "id"
    arguments: ["1", "store_", "sale_"]
  instance-2-sale:
    schema-pattern: "store_*"
    table-pattern: "sale_*"
    expression: "partition id"
    source-column: "id"
    target-column: "id"
    arguments: ["2", "store_", "sale_"]
  instance-3-sale:
    schema-pattern: "store_*"
    table-pattern: "sale_*"
    expression: "partition id"
    source-column: "id"
    target-column: "id"
    arguments: ["3", "store_", "sale_"]

mydumpers:
  global:
    threads: 4
    chunk-filesize: 64
    skip-tz-utc: true

loaders:
  global:
    pool-size: 16
    dir: "./dumped_data"

syncers:
  global:
    worker-count: 16
    batch: 100
    max-retry: 100
"Data Migration Shard Merge Scenario" was last updated Jan 28 2019: dm: add a dm directory (#873) (ad1d9da)
Improve this page

What’s on this page

Product

  • TiDB
  • TiSpark
  • Roadmap

Docs

  • Quick Start
  • Best Practices
  • FAQ
  • TiDB Utilities
  • Release Notes

Resources

  • Blog
  • Weekly
  • GitHub
  • TiDB Community

Company

  • About
  • Careers
  • News
  • Contact Us
  • Privacy Policy
  • Terms of Service

Connect

  • Twitter
  • LinkedIn
  • Reddit
  • Google Group
  • Stack Overflow

© 2018 PingCAP. All Rights Reserved.

中文