Scaling New Heights: Building Efficiency into Serverless Databases

Serverless makes a promise. You, as a user, can focus on solving your users’ problems while someone else worries about the infrastructure.

But not all parts of the tech stack are equally easy to make “serverless”. For example, it’s simple to see how stateless, short-running JavaScript in a standalone V8 instance is a prime candidate for the serverless model. But what about databases? They’re complex, they run indefinitely, and they’re most definitely stateful.

So, how do you build a serverless database to be cost-effective for both the operator and the developers who use it?

This was the challenge we faced when we came to create TiDB Serverless, our distributed MySQL-compatible serverless database. And here is the story of how we built what we believe is the world’s most efficient, most cost-effective serverless database.

Enjoy!

Diving into the Problem

Databases are a complex mix of interconnected compute and storage. That leads to a number of complications when trying to create a serverless database:

Continuous resource usage: Traditional databases use compute and storage resources continuously, regardless of whether you’re actively writing or querying. True serverless pricing bills you only for active usage. This model suits stateless and short-lived processes because the cloud provider can avoid idle resources thanks to the different usage patterns of multiple customers. However, it’s harder to achieve when background processes take care of indexing, for example.
Different processes have different resource needs. For example, some workloads are heavily memory-bound, while others might be CPU-intensive or I/O-intensive. The architecture of most database systems makes it harder to separate those services and deploy them efficiently.
Balancing reliability with resource efficiency: Distributed databases over-provision resources in order to improve reliability and scalability. That compounds the previous two challenges.

These challenges might help explain why some serverless database providers have switched away from true serverless pricing and to fixed monthly packages. They can’t allow customers to scale to zero because they get the same baseline cloud bill whether or not their customers are actively querying or writing to the database.

Design Principles for a Truly Serverless Database

When we set out to design TiDB Serverless, we recognized that the only way to create a truly serverless database was to rethink the architectural approach. We had a head start thanks to the managed, cloud version of TiDB. That gave us in-depth knowledge of what it means to run hundreds of thousands of database instances for production workloads. But, even so, there’s a big difference between a single tenant managed database and a massively multi-tenant serverless database.

We recognized that to create a truly serverless version of TiDB, we had to implement a robust set of design principles:

Design waste out of the system: At serverless scale, small elements of waste multiply into huge costs. We need to eliminate waste so that we can minimize our own cloud bill and enable customers to scale to zero.
Observability to control costs: We need to understand where each cost comes from and be able to use the same metrics to measure the impact of any improvements we make.
Support more customers with the same resources: In running a serverless database, we have several largely fixed costs such as R&D and support. By scaling our user base without increasing these fixed costs, we can reduce the cost per user as more users adopt the service.

Each of those principles would allow us to create a sustainable, cost-effective serverless database for the long-term. Before we look at what each of these principles means for the architecture of TiDB Serverless, let’s get a quick overview of TiDB itself.

TiDB: Distributed, ACID-compliant, MySQL-compatible Database

If you’re new to TiDB, here’s what you need to know:

Distributed architecture: TiDB scales horizontally, distributing data and processing across multiple nodes.
MySQL compatible: Fully compliant with MySQL protocols, TiDB makes migration easier if you’re coming from MySQL. In most cases, you don’t need to change a line of code.
Highly available: TiDB uses automatic failover to achieve high availability and resilience against node failures.
ACID transactions: TiDB supports robust ACID transactions, ensuring atomicity, consistency, isolation, and durability, so there is no need to compromise on data integrity.

We designed the original TiDB architecture for on-premises workloads, where your infrastructure is typically made up of commodity servers rather than specialist cloud services like high performance block storage or dedicated load balancers. As a result, the original TiDB splits workloads into specialized clusters but they are relatively monolithic to simplify DevOps and ensure compatibility with generic hardware.

Key: The Placement Driver (PD) cluster distributes data and workloads across the various services. The TiDB cluster performs SQL query processing and offers MySQL compatibility. TiKV is key-value storage, and TiFlash is a columnar store optimized for analytics workloads.

Serverless Needs a Different Architecture

In a serverless database, we’re operating in a very different environment:

We don’t need to worry about operational simplicity because we are operating the product ourselves.
The cloud platform offers greater variety in the types of service available, so we can tailor product choices to the characteristics of each service.
Even small inefficiencies can lead to large costs.
But we can expect to have a relatively consistent load across the service thanks to multitenancy. That also means we can share relatively expensive services amongst many customers, avoiding the waste that comes from idle time and overprovisioning in single tenant environments.

The result is an architecture of many logically isolated components that can scale independently, along with services that take full advantage of specific cloud services such as replacing generic disks with object storage.

So, how did our three design principles lead us to this architecture?

Design Waste out of the System

The first step we took was decomposing our monolithic system into much finer-grained microservices. This approach offers several advantages, particularly allowing us to:

Use the right tool for each job: The atomicity of our microservices means that processes with different resource profiles no longer have to share the same resources. For example, we have some workloads that are memory bound and we use Graviton3 instances for those workloads because it’s more efficient in those cases. For other workloads we might need to emphasize local disk performance, making another instance type more appropriate. This way we know that we’re spending every dollar as efficiently as possible.
Take advantage of greater elasticity: Finer grained microservices mean that we can scale precisely the parts of the system that are under demand at that moment.

Control Costs through Greater Observability

Microservices mean many more moving parts and that could allow wasteful services to hide amongst the noise. To allow customers to scale to zero and charge them only when they are actively writing to or querying the database, we must ensure our own cloud costs are as lean as possible.

We decided on an approach of rigorous observability. This involves setting benchmarks for what constitutes a good return on investment for each resource. We monitor every cost item across the platform to determine if the service provides value for money. For example, if we spend $1,000 a month on a particular component of the database but only receive 50% of the expected value, we adjust the service to increase utilization. We then use the same metrics to verify if the changes have led to an improvement.

Support More Customers with the Same Resources

The beauty of a multi-tenant system is that we can amortize its costs across many thousands of customers. That leads to two tangible benefits that feed directly into our ability to deliver serverless pricing:

More efficient resource utilization: Database traffic is often quite spikey. In a single-tenant system, we can address this by adding or removing nodes, but baseline resources still sit idle, leading to waste. In our multi-tenant TiDB Serverless, usage patterns tend to even out across the userbase. That leads to greater average utilization, meaning we’re less likely to be paying for cloud resources that we can’t bill on to customers.
We can spread the cost of non-cloud resources: We need to eliminate all blockers that would prevent us from scaling the userbase. For example, we need to be able to support a growing number of users with the same size of support team. We do that by focusing on an excellent developer experience, reducing the need for people to contact support.

Delivering a Cost-effective Serverless Database

Today, developers around the world use TiDB Serverless to power a diverse array of applications. By re-architecting TiDB specifically for the serverless model, rather than merely providing a managed service of an existing cloud database, we are able to offer a genuinely sustainable free tier. And when your usage increases beyond the free tier, there are no steep jumps in costs thanks to our genuinely serverless pricing model.

We invite you to explore TiDB Serverless and experience its benefits firsthand. Sign up for a free account and see what TiDB Serverless can do for you.