Guest post by Hari Kishan, Senior Engineering Manager at Delhivery, and Akash Deep Verma, Director of Technology at Delhivery
As the leading fulfillment platform for digital commerce in India, Delhivery delivers a million packages a day, 365 days a year. Its 24 automated sort centers, 101 hubs, 3,100+ direct delivery centers, 1000+ partner centers, 11,000+ fleet, and 60,000+ team members run smoothly thanks to a vast network of IoT devices. Thousands of data events and messages are coming in and going out of our pipelines each second. This amounts to a massive daily data volume in terabytes, which makes operational visibility crucial for us and our stakeholders.
Recognizing the requirements, we decided to build data marts—centralized, eventually consistent databases that offer users quick access to pre-aggregated business data. This allows our stakeholders to quickly access business insights without searching through an entire data warehouse.
However, with this daunting scale, one of the major challenges was to maintain data integrity and low latency while providing the capacity for analytical workloads.
In this blog, I am going to unpack all of my learnings while migrating our data marts from Amazon Aurora to TiDB, a distributed SQL database with hybrid transactional/analytical processing (HTAP) capabilities. Hopefully, this post can provide insights to data engineering leaders, database administrators, or data architects who are considering a similar migration to TiDB or any other distributed SQL database.
OLTP, OLAP, and HTAP
To better understand the real-time data marts case at Delhivery, let’s first get familiar with three concepts that are at the core of our use case: OLTP, OLAP & HTAP:
- OLTP: Online Transaction Processing (OLTP) systems are designed for transaction-oriented applications, ensuring data integrity through ACID (Atomicity, Consistency, Isolation, Durability) properties.
- OLAP: Online Analytical Processing (OLAP) systems enable high-speed, multidimensional analysis of large data volumes, aiding in data-driven decision-making.
- HTAP: Hybrid Transaction/Analytical Processing (HTAP) combines OLTP and OLAP functionalities, permitting real-time analytics on transactional data.
Real-time Data Marts Use Case at Delhivery
Real-time data marts differ from traditional data marts in that they ingest data in real time, not at specific intervals. These data marts are critical for ground operational decision-making at Delhivery because we can’t afford any delay in synchronizing these events.
Our real-time data mart journey began in 2020 when we identified a need for centralized dashboards—specifically the EYE dashboard. This dashboard’s purpose was to provide real-time operational visibility to ground operations, enabling decision-making based on up-to-the-minute data. Examples of usages include:
- Vehicle planning & visibility: Real-time monitoring of incoming and outgoing connection schedules for Delhivery hubs.
- Performance tracking: Continuous performance tracking of Delhivery’s facilities.
- Centralized control visibility: Providing the central team with precise information on ground blockers to take appropriate actions. These could be due to various factors such as a dip in center performance, shipment aging, or congestion in incoming and outgoing connections.
- Compliances: Tracking of put and pick compliance metrics
Initial Implementation and the Challenges
We thought of solving our use cases using data warehouse tools like Redshift and Snowflake. However, none of these solutions worked for us given the design pattern and requirements for real-time data ingestion and merge.
Thus, we initially chose Amazon Aurora PostgreSQL to serve our data mart use case.
The Data Ingestion Process around Amazon Aurora
We architected our real-time data marts using Spark Streaming and Amazon Aurora. Our steaming pipeline was very simple–reading data from Kafka, processing data in Spark microbatches, and performing upsert operations in Aurora.
We modeled our database using a multi-layered architecture, which consists of a raw layer, a partitioned layer, and a data marts layer. Users did not have access to view or modify data in the raw layer. The partitioned layer is kept to maintain all partitioned tables (dimension tables generally). Below is a simple schema design of our database:
Figure 1. Multi-layered data marts architecture
Challenges We Faced with Amazon Aurora
The system initially performed well, until it had to handle a throughput beyond 3K messages per second. This marked the onset of several challenges:
- Scalability limitation: As we exceeded a throughput of 3K messages per second, Aurora’s Input/Output Operations Per Second (IOPS) limitations became a bottleneck. The scalability constraint had started impacting our operations.
- Bloating issue: Every record update led to the creation of a new record and a dead tuple (previous version of the record). When the production rate of these dead tuples outpaced the cleanup process, bloating occurred. Since VACUUM FULL was not able to claim the storage, the disk usage continuously increased. For roughly 5 TB of data, Aurora was using 30+ TB of storage.
- Maintenance burden: The bloating issue is directly tied to our maintenance challenges. With over 70 pipelines and a total write QPS surpassing 5k messages/second, we found PostgreSQL’s auto cleanup process, Auto Vacuum, failed to keep pace with the rate of dead tuple generation. Therefore, manually running
VACUUM FULLis required to recover the database. Our attempts with PostgreSQL tools like
pgcompacttablealso proved unsuccessful. Consequently, maintenance became increasingly complex and time-consuming.
- Increased cost: To accommodate the read & write workload, we had to scale to the highest available nodes (24XLarge). This led to an expenditure of approximately $100,000 per month for a three-node Aurora cluster. With this scale, Aurora turned out to be expensive because of IOPS auto-scaling.
The Searching for Database Alternatives
To resolve Amazon Aurora’s limitations, we set out to find a better alternative that met the following requirements:
- Scalable with high write QPS: The database should support at least 10k+ write QPS and be horizontally scalable.
- Real-time analytics: The database should be able to provide high-speed or real-time OLAP capabilities
- Fully distributed: The database should be distributed across multiple sites to provide high availability and fault tolerance.
- Strong consistency: The database should maintain strong consistency, ensuring that all users see the same data.
Considering all the above requirements, we initially explored many PostgreSQL alternatives including Google Spanner and YugabyteDB because we wanted to keep our change management minimal.
Google Spanner is a distributed SQL database management and storage service offered by Google. It is fully managed on the Google Cloud Platform (GCP). However, Google Spanner might not be a good use case for us because we did not find the proper tools to load historical data. We explored Harbourbridge, an open-source tool for Spanner evaluation and migration. However, it had limitations around 100 GB of data loading.
YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte. This database is very close to our use case because it was fully PostgreSQL compliant, horizontally scalable, and fully distributed. Unfortunately, it didn’t work as well because of its limitation with scalability, Our success criteria demanded 7k+ transactions per second but YugabyteDB was only able to scale up to 5k.
We also looked into other possible candidates like Google BigQuery, but none of them served our requirements well.
After the above PostgreSQL alternatives, we decided to add HTAP to our requirements, which led us to explore TiDB. It supports out-of-the-box scalability, consistency, availability, multi-site deployment topology, and many more features. As a distributed SQL database, TiDB has multiple components that communicate with each other and form a complete TiDB system.
- TiDB: The stateless SQL processing component that provides the client-facing endpoint to the user. It locates the correct TiKV node to connect from PD to get the data.
- TiKV: A distributed transactional key-value data store that keeps the data in the left-closed-right-open range. Data is kept in shards with multiple replicas. TiKV uses the Raft protocol for replication.
- PD: The placement driver (PD) keeps the metadata of the cluster such as shard replica locations. It’s also responsible for scheduling the shards across TiKV nodes. PD leader handles such tasks while other nodes maintain high availability.
- TiFlash: The columnar storage extension that uses the Multi-Raft Learner protocol to replicate data from TiKV in real-time, ensuring consistent data between the TiKV row-based storage engine.
The following features of TiDB addressed our key challenges and met our operational requirements:
- Horizontal scaling: TiDB’s architecture separates compute from storage, letting you scale out or in the compute or storage capacity online as needed. The scaling process is transparent to application operations and maintenance staff.
- Fully ACID compliant: TiDB is ACID-compliant and supports transactions out of the box. It supports both optimistic and pessimistic types of transactions. This makes it unique from other databases.
- High availability: TiKV stores data in multiple replicas and uses the Multi-Raft protocol to obtain the transaction log. A transaction can only be committed when the data has been successfully written into the majority of replicas. This guarantees strong consistency and high availability when a minority of replicas go down.
- Real-time HTAP: TiDB combines both row storage (TiKV) and columnar storage (TiFlash) in the same architecture, forming a streamlined tech stack that makes it easier to produce real-time analytics on operational data.
Our TiDB Infrastructure
Our TiDB infrastructure is deployed on the VMs of leading cloud service providers. We use TiUP, TiDB’s package manager, to manage the cluster and all the administrative operations. Our cluster is deployed across 3 available zones (AZs).
Our cluster configurations are as follows:
- PD: The PD layer has 3 nodes split across Multi-AZs. PD leader handles such tasks while other nodes maintain high availability.
- TiDB: The TiDB layer has 9 nodes of the n2-highmem-8 family. These nodes were chosen based on the memory requirements, with 64 GB RAM & 8 Core CPUs allocated for each TiDB node.
- TiKV: The TiKV layer has 15 nodes of the n2-highmem-16 family which has 128 GB RAM & 16 vCORE CPUs.
By deploying our TiDB cluster across multiple AZs and carefully selecting node types to meet our processing and memory needs, we’ve created a robust, highly available infrastructure capable of handling our high data throughput requirements.
Tuning TiDB for Our Case
To make it work for our use case, we worked closely with the PingCAP team to tune the database. Here are some of the critical adjustments we made:
- Index Optimization:
Set the following parameters before starting the index.
SET @@global.tidb_ddl_reorg_worker_cnt = 16; SET @@global.tidb_ddl_reorg_batch_size = 4096;
Reset to default values after index creation.
SET @@global.tidb_ddl_reorg_worker_cnt = 4; SET @@global.tidb_ddl_reorg_batch_size = 256
- Partition Pruning: This is mainly important for partitioned tables. It analyzes the filter conditions in query statements and eliminates (prunes) partitions when they do not contain any required data.
SET @@global.tidb_ddl_reorg_worker_cnt = 16; SET @@global.tidb_ddl_reorg_batch_size = 4096;
- Tuning Analyze: Sometimes the auto analyzer in TiDB fails if a high volume of data is ingested. In that case, all the queries might use the wrong execution plan and end up scanning the full table. To avoid such a situation we made the following changes in TiDB configurations:
SET global tidb_max_auto_analyze_time = 86400; SET global tidb_enable_pseudo_for_outdated_stats = off;SET global tidb_sysproc_scan_concurrency = 15;
If you are working with partitioned tables, we suggest you run analyze table operations manually for one partition at a time to avoid analyzing failures.
Through adjustments like these, we were able to effectively streamline our use of TiDB, so that we can achieve an optimal performance for our real-time data mart.
Our Experience with TiDB
Our transition to TiDB has brought the following benefits:
- Improved queries performance
We have benchmarked 400+ queries and found that all the queries are running within SLA. We have even seen a 15-20% performance gain of P95 queries.
- Easy migration
We used the TiDB Lightning tool to migrate all of our table’s historical data from PostgreSQL to TiDB. This tool is very easy to use and very fast. We were able to load TBs of data within roughly 2-3 hours. However, it’s worth noting that there is a lot of tuning required before loading such a large amount of data.
- Strong support
We went through a couple of hiccups during the production infrastructure setup but the PingCAP support team played a very crucial role and helped us tune the cluster for the nature of the workload.
In this post, we explored the challenges of using Amazon Aurora with our use case of real-time data marts and the migration journey to TiDB. We also discussed how Delhivery is using TiDB at scale.
So far, we have deployed TiDB in our production environment. Based on our benchmarks, TiDB enables us to handle thousands of requests per second with less than 100ms latency. Moving forward, we will continue to explore more use cases that require a robust, strongly consistent, distributed SQL database.
A fully-managed cloud DBaaS for predictable workloads
A fully-managed cloud DBaaS for auto-scaling workloads