Authors: Qiang Wu, Xiang Zhang (TiDB Cloud Engineers)
Editors: Fendy Feng, Tom Dewan
TiDB Cloud is a fully-managed Database-as-a-Service (DBaaS) that brings TiDB, an open source Hybrid Transactional and Analytical Processing (HTAP) database, to your cloud.
Airbyte is an open-source data integration engine that helps you consolidate your data in data warehouses, data lakes, and databases.
Snowflake provides Software-as-a-Service (SaaS) service with a single and global platform for data warehousing with seamless data collaboration.
Airbyte can help you connect TiDB Cloud with Snowflake, migrate data from TiDB Cloud, the source, to Snowflake, the destination, and leverage Snowflake’s strong analytical capability to respond to your multi-dimensional analytical queries. (The source data can come from either TiDB Cloud or a self-deployed TiDB cluster.) Airbyte can also synchronize data from TiDB Cloud to Snowflake periodically and give you a macro-progressive data view.
In this tutorial, we will walk you through how to use Airbyte to connect TiDB Cloud with Snowflake, and how to make data migration from the former to the latter.
Get started with TiDB Cloud Dev Tier
- Sign up for a TiDB Cloud account free of charge and log in to it. If you already have a TiDB Cloud account, you can also log in to your account directly.
- On the Active Clusters page, click Create Cluster. Then, in the Developer Tier box, select 1 year Free Trial.
Note: You can also select the paid TiDB Dedicated Tier to get better performance.
- Set up your cluster name, cloud provider (only AWS is available for the Dev Tier), and region. (A nearby region is recommended.) Then, click Create to create your cluster.
Note: It usually takes 1 to 3 minutes to create your TiDB Cloud cluster.
- After you create a cluster, you will be directed to the Active Clusters page. Click the cluster name you just created and go to the cluster control panel.
- Click Connect to create a traffic filter.
- On the pop-up window, set
0.0.0.0/0as the IP Address to allow access from any other IP address.
Import sample data to TiDB Cloud
After you create a TiDB Cloud cluster, you can migrate data to TiDB Cloud. Here, we use Capital Bikeshare’s system dataset as the sample data to walk you through how to import data to TiDB Cloud.
- Click Import and you will be directed to the Data Import Task page.
- Fill the following parameters in the fields on the Data Import Task page.
- Data Source Type:
- Bucket URL:
- Data Format:
- Target Cluster Username and password: Same as that of your TiDB cluster.
- Data Source Type:
- After you fill in all the fields, click Import. Now, the sample data can be imported to TiDB Cloud. It usually takes around 3 minutes to finish the import process.
Set up Airbyte-specific entities in Snowflake
To write data into Snowflake, you need to create Airbyte-specific Snowflake entities, including a warehouse, a database, a schema, a user, and a role, with the OWNERSHIP permission.
- Sign up for a Snowflake account first, which allows you a 30-day free trial. If you already have one, you can sign in to your account directly.
- After you sign in to your Snowflake account, click Worksheets to create a new worksheet.
- Change the
airbyte_schemain the following script. Here, we use the default parameter as an example. Then, copy the whole script and paste it into the Worksheets page.
-- set variables (these need to be uppercase) set airbyte_role = 'AIRBYTE_ROLE'; set airbyte_username = 'AIRBYTE_USER'; set airbyte_warehouse = 'AIRBYTE_WAREHOUSE'; set airbyte_database = 'AIRBYTE_DATABASE'; set airbyte_schema = 'AIRBYTE_SCHEMA'; -- set user password set airbyte_password = 'password'; begin; -- create Airbyte role use role securityadmin; create role if not exists identifier($airbyte_role); grant role identifier($airbyte_role) to role SYSADMIN; -- create Airbyte user create user if not exists identifier($airbyte_username) password = $airbyte_password default_role = $airbyte_role default_warehouse = $airbyte_warehouse; grant role identifier($airbyte_role) to user identifier($airbyte_username); -- change role to sysadmin for warehouse / database steps use role sysadmin; -- create Airbyte warehouse create warehouse if not exists identifier($airbyte_warehouse) warehouse_size = xsmall warehouse_type = standard auto_suspend = 60 auto_resume = true initially_suspended = true; -- create Airbyte database create database if not exists identifier($airbyte_database); -- grant Airbyte warehouse access grant USAGE on warehouse identifier($airbyte_warehouse) to role identifier($airbyte_role); -- grant Airbyte database access grant OWNERSHIP on database identifier($airUse existing destinationbyte_database) to role identifier($airbyte_role); commit; begin; USE DATABASE identifier($airbyte_database); -- create schema for Airbyte data CREATE SCHEMA IF NOT EXISTS identifier($airbyte_schema); commit; begin; -- grant Airbyte schema access grant OWNERSHIP on schema identifier($airbyte_schema) to role identifier($airbyte_role); commit;
4. Click Query.
5. Go back to the Airbyte home page and click Database. You’ll see that a worksheet named
AIRBYTE_DATABASE has been created with an
Note: Make a note of these parameters. You will be entering some of them again later when you create a Snowflake destination.You can also learn more at the Airbyte documentation.
Deploy Airbyte in your own workspace
You only need to take a few steps to make local deployment of Airbyte.
- Install Docker, an application container, onto your workspace.
- Clone the Airbyte source code shown below to your Docker.
$ git clone https://github.com/airbytehq/airbyte.git $ cd airbyte
- Run the Docker images with
docker-compose. When you see the Airbyte banner, the UI is ready to go at
$ docker-compose up
Create a TiDB Cloud Source in Airbyte
After you deploy Airbyte, you can set up a new source connector for TiDB Cloud.
- Click Sources on your Docker panel, and select TiDB as your source type.
- Fill in all the fields including the Name, Host, Port, Database, Username, and Password on the panel.
- In the JDBC URL Params field, fill in either
enabledTLSProtocols=TLSv1.3to set your TLS protocols to TLSv1.2 or TLSv1.3.
- Enable SSL Connection.
Note: If you want to disable the TLS connection with TiDB Cloud, you need to change the previous parameter in the JDBC URL Params field to
useSSL=false, and disable SSL Connection.
- Click Set up source to create the TiDB Cloud source.
Create a Snowflake destination in Airbyte
- Click Destinations on your Docker panel and select Snowflake as the destination type.
- Fill in all the fields including the Name, Host, Role, Warehouse, Database, Default Schema, and Username. These are parameters in the Set up Airbyte-specific entities in Snowflake section.
- In the Authorization Method drop-down box, select Username and Password to associate the Password with the Username.
- In the Loading Method drop-down box, select [Recommended] Internal Staging.
- Click Set up destination to create the Snowflake destination.
Build the connection and transform data from TiDB Cloud to Snowflake
After you set up the TiDB Cloud source and Snowflake destination, you can build the connection between the two.
- Select Destinations on your Docker panel and click New Connection.
- Click Create a source and select Select an existing source. Then, in the drop-down box, choose TiDB which you previously created and click Use existing source.
- Similarly, select Snowflake we created previously as the destination, and click Use existing destination.
- After you select the source and destination, go to the Set up connection panel. Create a name to help you identify this connection. Here, we use TiDB-Snowflake as the connection name.
- In the Replication frequency field, select Manual. This means the data will be migrated manually.
- In the Destination Namespace field, select Mirror source structure to make the data stored in the same location in the destination as in the source.
- Select and activate the schema you want to migrate from TiDB Cloud to Snowflake. Here, as an example, we select the
- TiDB source connector supports two types of synchronization: Incremental and Full Refresh. In the Incremental sync mode, Airbyte only reads records added to the source since the last sync task. (In the first sync task, the Incremental sync mode is equivalent to the Full Refresh sync mode.) In the Full Refresh sync mode, Airbyte reads all the records in the source and migrates them to the destination during every sync task.
- You can set a sync mode for each table seperately in the Namespace section in Airbyte.
- In the Normalization & Transformation section, select Normalized tabular data.
- Click Set up connection to start the data synchronization task. It usually takes a few minutes to complete this process.
Congratulations! After you finish all the steps above, you have successfully migrated the data from TiDB Cloud to Snowflake, and stored the data in the
AIRBYTE_DATABASE.BIKESHAREschema. Now, you can make analytical queries in Snowflake.
In this tutorial, you’ve learned how to configure TiDB Cloud, Snowflake, and Airbyte, and how to use Airbyte to connect the other two. You also learned how to migrate data from TiDB Cloud and Snowflake for multi-dimensional analytics. TiDB source connector is already available in Airbyte local deployment, and soon we will release the TiDB destination connector to support data migration from other sources.
If you encounter any problem during the configuration or migration, you can contact us through Twitter, our Slack channel, or you can join TiDB Internals to share your thoughts and feedback with us.
Analytics on TiDB Cloud with Databricks
How to Achieve High-Performance Data Ingestion to TiDB in Apache Flink
Data Transformation on TiDB Made Easier
Have a question or comment about the article? Visit the TiDB Forum
Subscribe to Stay Informed!
Get the massive scale and resiliency of TiDB databases in a fully managed cloud service
TiDB is effortlessly scalable, open, and trusted to meet the real-time needs of the digital enterprise