
Modern applications generate enormous amounts of event data with user actions, transactions, logs, and metrics all happening in real time. To handle this scale, many teams rely on Apache Kafka, a distributed messaging system that decouples applications from their data pipelines and ensures reliable, high-throughput data delivery.
On the storage side, TiDB provides a distributed SQL database that scales horizontally, handles both transactional and analytical queries, and maintains low-latency performance even under heavy load.
Together, Kafka and TiDB form a powerful foundation for real-time workloads where high write throughput and fast data processing are critical.
This two-part blog tutorial explores how to integrate Kafka with TiDB. Part 1 covers the basics around how to stream data from Kafka to TiDB and why this architecture is becoming increasingly popular. Part 2 will examine how TiDB performs when Kafka processes millions of messages per second and how to monitor TiDB’s internal performance.
Why Stream Data through Kafka?
A recent customer project involved an application that sent messages directly to Kafka. From Kafka, data flowed into a persistent storage layer that included systems such as SQL Server and Cassandra.
This design choice is common for systems that handle large volumes of writes. Sending data directly to a database under heavy load can lead to latency issues, slowing down the entire application. Kafka helps mitigate this by acting as a buffer between the application and the database, ensuring that high-frequency writes are first collected and processed asynchronously before reaching the storage layer.
By decoupling ingestion from persistence, Kafka maintains consistent performance and reliability even during spikes in traffic.

Fig. 1: How Kafka decouples application data streams
What is TiDB?
TiDB is an open-source, distributed SQL database designed for horizontal scalability, strong consistency, and high availability. It uses a decoupled compute and storage architecture, allowing each layer to scale independently — a key advantage for cost and performance optimization.
TiDB is MySQL-compatible, which means existing applications, drivers, and SQL syntax can often be reused with minimal modification. This compatibility also makes it easier to migrate from other databases such as MySQL, PostgreSQL, or MongoDB.

Fig. 2: How TiDB complements Kafka for unified workloads
Let’s start with an example from a test TiDB instance running on the cloud:
ankitkapoor@Ankits-MacBook-Air bin % ./mysql -uankit -hxxx -P 4000 -p
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| INFORMATION_SCHEMA |
| PERFORMANCE_SCHEMA |
| ankit |
| kafka |
| mysql |
| test |
+--------------------+
6 rows in set (0.10 sec)
What About TiDB to Kafka?
TiDB can also stream data to Kafka using TiCDC, which is documented here.
TiCDC (TiDB Change Data Capture) is a component that captures real-time changes from TiDB and replicates them downstream. It reads Raft logs, internal records that track every change in the TiDB cluster, and pushes those changes to external systems like Kafka, another TiDB cluster, or cloud storage.
For reference, TiDB Raft log files typically look like this:
-rw-r--r-- 1 ankitkapoor cc 69B 11 Sep 02:29 0000000000000001.rewrite
-rw-r--r-- 1 ankitkapoor cc 0B 11 Sep 02:39 LOCK
-rw-r--r-- 1 ankitkapoor cc 869K 11 Sep 02:39 0000000000000001.raftlog <— Raft log
While TiCDC handles data from TiDB to Kafka, this article focuses on the reverse, Kafka to TiDB.
Kafka to TiDB: Overview
Streaming data from Kafka to TiDB is often achieved using Kafka Connect, an open-source framework for building scalable and reliable data pipelines. While other tools like PySpark can accomplish this, Kafka Connect provides a simpler and more performant approach, especially for production environments.
Since TiDB is MySQL-compatible, existing MySQL JDBC drivers can be used to set up the data stream between Kafka and TiDB.
Requirements
To follow this guide, the following components are required:
- Kafka
- Zookeeper
- Kafka-topics
- Kafka-console-producer
- Kafka-console-consumer
- Kafka Sink connector
- MySQL client
- TiDB cluster
Test Environment
- Local machine: MacOS 15.6.1
- MySQL client: 9.4.0 (any recent version will work)
- Database: TiDB Cloud Serverless (publicly available)
What this Blog Won’t Cover
This blog assumes basic familiarity with Kafka fundamentals such as Zookeeper, Kafka topics, messages, and streaming concepts. Those topics are well-documented in the official Kafka resources and will not be repeated here.
Getting Started
Step 1: Install Kafka
brew install kafka
Expected message:
To start kafka now and restart at login:
brew services start kafka
Or, if you don't want/need a background service you can just run:
/opt/homebrew/opt/kafka/bin/kafka-server-start /opt/homebrew/etc/kafka/server.properties
For Linux, refer to the official setup guide.
Step 2: Install Zookeeper
brew install zookeeper
Expected message:
To start zookeeper now and restart at login:
brew services start zookeeper
Or, if you don't want/need a background service you can just run:
SERVER_JVMFLAGS="-Dapple.awt.UIElement=true" /opt/homebrew/opt/zookeeper/bin/zkServer start-foreground
Start Zookeeper:
brew services start zookeeper
Step 3: Download Dependencies
Download the following:
- Kafka Connect JDBC:
confluentinc-kafka-connect-jdbc-10.8.4
- MySQL JDBC Connector:
mysql-connector-j-9.4.0
Move the MySQL connector JAR into the Confluent library and create two configuration files:
- Connect-standalone.properties
- Mysql-sink-connector.properties
Step 4: Configure Kafka Connect
connect-standalone.properties
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
plugin.path=/pathto_sink_jdbc_connector/
mysql-sink-connector.properties
name=jdbc-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=kafka_to_TiDB ( one which we will be creating later, you can choose your desired name )
connection.url=jdbc:hostname:4000/yourdatabase
connection.user=user_name
connection.password=password
auto.create=false
auto.evolve=false
insert.mode=insert
pk.mode=none
table.name.format=tb_kafka_to_TiDB
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
transforms=filter
transforms.filter.type=org.apache.kafka.connect.transforms.ReplaceField$Value
transforms.filter.include=id,user
Step 5: Create the Target TiDB Table
CREATE TABLE `tb_kafka_to_TiDB` (
`id` int DEFAULT NULL,
`user` char(255) DEFAULT NULL
)
Step 6: Start Kafka Connect
Run the following command in the same directory as the configuration files:
connect-standalone connect-standalone.properties mysql-sink-connector.properties
Note: Ensure that this command is run in the same folder where the two configuration files — Connect-standalone.properties and Mysql-sink-connector.properties — were created.
Successful startup logs will include:
kafka_to_TiDB-0 (org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker:58)
[2025-08-18 19:58:35,603] INFO [jdbc-sink|task-0] [Consumer clientId=connector-consumer-jdbc-sink-0, groupId=connect-jdbc-sink] Found no committed offset for partition kafka_to_TiDB-0 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1508)
[2025-08-18 19:58:35,607] INFO [jdbc-sink|task-0] [Consumer clientId=connector-consumer-jdbc-sink-0, groupId=connect-jdbc-sink] Resetting offset for partition kafka_to_TiDB-0 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:9092 (id: 1 rack: null isFenced: false)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState:447)
[2025-08-18 19:58:46,968] INFO [jdbc-sink|task-0] JdbcDbWriter Connected (io.confluent.connect.jdbc.sink.JdbcDbWriter:57)
Step 7: Create and Test a Kafka Topic
Create a topic and start a producer:
kafka-console-producer --bootstrap-server localhost:9092 --topic kafka_to_TiDB --property parse.key=false --property "key.separator=:"
Then, start a consumer to verify message parsing:
kafka-console-consumer --bootstrap-server localhost:9092 --topic kafka_to_TiDB --from-beginning
hello
kafka
whats goin on
man
"User signed up"
"User signed up"
"User signed up"
{"id": 123, "status": "active"}
{"temperature": 25.4}
Step 8: Send Messages to Kafka
kafka-console-producer --bootstrap-server localhost:9092 --topic kafka_to_TiDB --property parse.key=false --property "key.separator=:"
>{"schema":{"type":"struct","fields":[{"field":"id","type":"int32"},{"field":"user","type":"string"}],"optional":false,"name":"kafka_to_TiDB"},"payload":{"id":1,"user":"Ankit"}}
The Kafka Connect logs should confirm successful writes:
[2025-08-18 19:58:48,424] INFO [jdbc-sink|task-0] Setting metadata for table "ankit"."kafka_to_TiDB" to Table{name='"ankit"."kafka_to_TiDB"', type=TABLE columns=[Column{'id', isPrimaryKey=false, allowsNull=true, sqlType=INT}, Column{'user', isPrimaryKey=false, allowsNull=true, sqlType=CHAR}]} (io.confluent.connect.jdbc.util.TableDefinitions:64)
[2025-08-18 19:58:48,725] INFO [jdbc-sink|task-0] Completed write operation for 1 records to the database (io.confluent.connect.jdbc.sink.JdbcDbWriter:100)
[2025-08-18 19:58:48,726] INFO [jdbc-sink|task-0] Successfully wrote 1 records. (io.confluent.connect.jdbc.sink.JdbcSinkTask:91)
Verify in TiDB
Finally, connect to TiDB using the MySQL client:
./mysql -u 'ankit' -hhostname -P 4000 -p
Then query the table:
mysql> select * from ankit.kafka_to_TiDB;
+------+-------+
| id | user |
+------+-------+
| 1 | Ankit |
+------+-------+
You’ll see that your data was successfully inserted, and Kafka is now streaming events into TiDB.
Conclusion
By streaming data from Kafka to TiDB, organizations can take advantage of Kafka’s ability to handle massive event throughput while leveraging TiDB’s distributed SQL capabilities for scalable, real-time data processing. This setup helps reduce latency, prevent write bottlenecks, and ensure application performance remains smooth even under demanding workloads.
In Part 2 of this blog tutorial, we’ll dive into performance testing and observability, exploring how this architecture behaves under millions of messages per second and how to effectively monitor TiDB’s performance.
Want to try this yourself? Experience TiDB in action with the TiDB Cloud Quick Start Lab. For a deeper dive into distributed SQL, check out the TiDB University Courses with self-paced modules that cover everything from TiDB fundamentals to advanced performance tuning and real-world streaming integrations.
Experience modern data infrastructure firsthand.
TiDB Cloud Dedicated
A fully-managed cloud DBaaS for predictable workloads
TiDB Cloud Starter
A fully-managed cloud DBaaS for auto-scaling workloads