HTAP Benchmark: Why Web3 Workloads Need Web3Bench

This blog introduces Web3Bench, a hybrid transaction/analytical processing (HTAP) benchmark that addresses earlier limitations. Web3Bench is based on real-world Web3 use cases that utilize HTAP. Our data model is a simplified version of the decentralized blockchain Ethereum. We leverage a sample data set from Ethereum to build a scale factor-based data generator.

The workload in Web3Bench focuses on simple queries representing online processing with a high number of queries per second. We successfully verified Web3Bench on TiDB, an open-source distributed SQL database, as a proof of concept of the feasibility and applicability of the benchmark.

Why Do We Need a HTAP Benchmark?

Benchmarking serves as a yardstick for comparing software products with similiar objectives. The database industry has always embraced various benchmarks to measure different systems like online transaction processing (OLTP) and online analytical processing (OLAP). However, HTAP databases are becoming more popular in the database community.

The below diagram compares how traditional databases separate OLTP and OLAP vs. HTAP systems that process OLTP and OLAP in the same database with possibly a row store for OLTP and column store for OLAP.

The rise of HTAP databases also requires appropriate benchmarks for those systems. We believe existing HTAP benchmarks are not representative as transactional and analytical benchmarks simply glued together. More recent proposals did not address the online aspect of HTAP and re-used previous benchmark data models that do not represent HTAP workloads.

Getting Started with Web3Bench

Built upon the Web3 workload, a prevalent scenario for HTAP and a flourishing domain within decentralized databases, this benchmark distinguishes itself from its counterparts through the following features:

It is based on Web3 business departing from outdated data models. Web3 needs HTAP databases since it requires (1) streamlined security and identity management that requires milliseconds latency and (2) complex analytics like predictive analytics and suggesting investment strategies.
It covers HTAP workloads including Online Data Serving that lie in between OLTP and OLAP.

Data Model

One of the main objectives of Web3Bench is to bring a new and representative business problem for HTAP benchmarking and avoid re-cycling off artificially outdated data models. In this regard, we chose the decentralized blockchain Ethereum data model.

Ethereum original data is stored in a merkle tree data structure that is not a relational table by nature. The Google BigQuery team built a seven table relational schema from the Ethereum data model. We have simplified further on this BigQuery model to only four tables: Transactions, Contracts, Blocks, and Token_transfers. The diagram below shows more details on these tables and how they are related.

Data Generator

We designed and implemented a custom data generator for Web3Bench. The generator produces data for the four tables in the data model with a configurable scale factor. This enables testing various systems that can range from gigabytes (GBs) to petabytes (PBs).

The data generator relies on data demographics derived from the Ethereum sample data-set to make it more representative of the real world.

The demographics are as follows:

Frequency (count distinct) of FK values in all the tables.
(from_address, to_address) distinct pairs

The data generator is also multi-threaded in nature to quickly upload the data into the tables in an efficient way.

Read Workload

As discussed earlier, one of our contributions is to cover the full spectrum of HTAP analytics (i.e., real-time and OLAP queries) rather than just two extremes (which was done in an earlier benchmark).

We introduce a new type of metric workload section called online serving (which we’ll call O for short). The latency in this category is tunable from one to 10 seconds.

We also classified the query result measurement into the following categories:

Queries with latency less than or equal to 0.1 seconds. (called T queries)
Response time within a second (called O queries).
Response time within 10 seconds (again called O queries).
Lastly, no explicit latency requirements on a set of queries called A queries. Here throughput is the critical factor to distinguish the systems.

Queries

As mentioned earlier, we classified a set of queries into three categories called O, T, and A. Let us dive more into the queries.

T Queries

These are the queries which just retrieve a single transaction (also called point queries) since they hash a unique identifier for a particular transaction. The query template “R1” for these queries is:

SELECT to_address, from_address
FROM transactions
WHERE hash = ?

O Queries

These are the queries with latency less than or equal to 1 second. There are two query templates for this class of queries. Both of them retrieve a small list of transactions.

Template 1 “R21”

SELECT *
FROM token_transfer
WHERE from_address = ?
ORDER BY block_number DESC
LIMIT 5

Template 2 “R22”

SELECT count(*)
FROM token_transfer
WHERE token_address = ?

There is a further sub category of queries that have latency characteristics of <= 10 seconds.

Template 3 “R23”

SELECT *
FROM token_transfers 
WHERE token_address = ?
AND block_number <= ?
AND (next_block_number > ? or next_block_number = ?)
ORDER BY block_number DESC
LIMIT 5

Template 4 “R24”

SELECT count(*)
FROM transactions
WHERE to_address not in (?, ?, ?)

Template 4 “R25”

SELECT count(*) 
FROM token_transfers 
WHERE next_block_number <= block_number
GROUP BY next_block_number

A Queries

This category contains queries with no latency limitations. These are generally complex queries with a lot of computation and data movement.

There are five queries in this category, and you can find their descriptions below. The SQL queries can be found in this repository.

R31: Find transactions where a person is either a sender or receiver, limit the result to the most recent timestamp.
R32: Top N transactions based on block timestamp.
R33: Find the number of unique senders (from_address) in transactions
R34: Find Top N senders (from_address) by total transaction value
R35: Total count of token transfers for a specific sender and token transfers for recipients who are also senders in other transactions.

Write Workload

Writes (W for short) are an important part of HTAP. Mostly are either single row inserts or inserts with small batches. The writes include inserts, deletes, and updates. The following is a high-level description of the write operations in Web3Bench.

Inserts
- Single point inserts for four tables labeled as W11, W12, W13 and W14.
- W2: Small batch inserts (100 rows) for the transaction table.
- W3: Insert 1000 rows into transactions from a temp table.
Updates
- W4: Single point update for the transaction table
- W5: Join update for the token_transfers table.
Deletes
- W6: Single record deletes for the transaction table.

We developed a driver to submit read (R) and write (W) requests over 60 minutes, varying their frequency. Complex reads (R3 queries) and writes (batch inserts, join updates) occur once. These figures are averages from analyzing TiDB’s Web3 customers, but We3Bench allows researchers to adjust frequencies to their preferences. Request frequencies are shown in the table’s fourth column. 70% of the workload comprises read requests (R1-R35), while writes (W1-W6) constitute 30%. R1 queries are the most common, followed by O queries with 1-second and 10-second limits. Single-row inserts dominate writes.

For our workload, we built a driver that (1) submits the above reads and writes concurrently and (2) collects the response time of each request. There is a pass/fail decision based on the latency requirements for T, O, and single-point writes. Additionally, the driver computes simple metrics like average latency per request type and total latencies for the whole workload.

HTAP Benchmark Testing: Experimenting with TiDB

Numerous systems can handle HTAP workloadss. However, TiDB stands out as a significant contender, as it can cater to the entire spectrum of T, O, and A use cases. We chose TiDB as the testing platform for Web3Bench due to its open-source nature, simple setup process, and compatibility with MySQL. It’s important to note that the objective of the test isn’t to evaluate TiDB’s performance, but rather to demonstrate the feasibility and relevance of Web3Bench.

We produced a data set with scalefactor = 6000 that translates to around 500GB raw data for Web3Bench. Our index advisor was used to create appropriate indexes that helped with the performance of some of the queries. We also had TiFlash, the columnar storage component for TiDB, as part of the system configuration to handle the OLAP benchmark.

The table below shows the system configuration we used for the test. We used three TiDB servers that are stateless nodes responsible for receiving, parsing, optimization, and plan generation for client requests. We also used three TiKV and two TiFlash nodes for the row and column stores (respectively). Note that TiFlash requires more hardware resources since it handles more complex queries than TiKV.

	Count	Number of CPU (per node)	Memory (per node)	Storage (per node)
TiDB	3	16	128 GB	/
TiKV	3	16	64 GB	1 TB
TiFlash	2	64	128 GB	1 TB

Exploring HTAP Benchmark Testing

Column 2 in the results table below illustrates which storage and compute is applied to each test component. OLTP mainly uses row store (TiKV), online serving uses both TiKV, and TIFlash and OLAP mainly rely on TiFlash. Writes trigger updates to TiKV for raw data and indexes and also gets propagated to TiFlash.

The third column (Requests) lists the detailed read and write requests described above in the workload description. The request frequency (i.e., how many times a request was executed) is displayed in the fourth column. Overall, number of read requests (R1 through R35) constitute 70% of the workload vs. 30% for writes (W11 through W6). R1 queries rank as the most common, trailed by O queries limited to 1 second, and then by O queries with a 10-second limit. For writes, single-row inserts, updates, and deletes are the most common.

Measuring HTAP Benchmark Performance

We measured performance by average latency (i.e., the 5th column). To illustrate the row and column store optimizations, we ran the benchmark without indexes (i.e., the 6th column) and without TiFlash (i.e., the last column). Overall, the lack of indexes and column store optimizations caused 10X and 7X regressions, respectively (see total latencies for each of the three tests). The test with no indexes had no impact on R1, while R21, R22 and R32 degraded. The test without TiFlash had a degradation of R24, R25, R31, R33, R34 and R35.

For writes, we also ran tests to measure the effect of indexes and TiFlash on writes, which are supposed to incur overhead. The improvements of writes with no indexes and no TiFlash are far less than the benefit of these structures for reads. In summary, our experiments show that Web3Bench is a good representative benchmark for HTAP with a clear impact of optimizations in both row store and column store across the workloads.

The Results

				Seconds
Workload Component	TiDB Engine	Requests	Request frequency	Average latency	Average latency no indexes	Average latency no columnar
OLTP	TiKV	R1	5,760,000	0.00061	0.00061	0.00061
Online	TiKV and TiFlash	R21	57,600	0.0015	0.66	0.0015
		R22	57,600	0.062	0.74	0.062
		R23	57,600	0.0037	0.51	0.0037
		R24	1000	3.04	3.04	65.24
		R25	1000	1.86	1.86	14.10
OLAP	TiFlash	R31	1	0.0064	0.0064	18.70
		R32	1	0.16	47.40	0.16
		R33	1	2.99	2.99	369.30
		R34	1	5.81	5.81	1,358.10
		R35	1	12.20	12.20	249.30
		Total	5,934,805	12,261	118,466	88,707


Write	TiKV and TiFlash	W11	288,000	0.000079	0.000075	0.000063
		W12	288,000	0.000108	0.000082	0.000065
		W13	288,000	0.000108	0.000085	0.000067
		W14	288,000	0.000109	0.000089	0.000062
		W2	1	0.00323	0.0028	0.00212
		W3	1	0.0543	0.0458	0.038
		W4	288,000	0.0028	0.0026	0.0022
		W51	1	15.10	14.00	12.00
		W52	1	1070.40	974.00	966.20
		W6	288,000	0.00071	0.00063	0.00058
		Total	1,728,004	2,213	2,014	1,853
			Grand Total	14,474	120,480	90,560

You can find the benchmark here. The repository has clear instructions on prerequisites, TiDB setup, loading data (with desired scale factor), and how to run the benchmark.

Conclusion

This post introduced Web3Bench, an innovative HTAP benchmark inspired by the growing Web3 business paradigm. This benchmark encompasses read operations, ranging from real-time interactions to online serving and batch processing. On the write side, it involves tasks such as inserts, updates, and deletes, each characterized by varying levels of complexity.

Finally, the authors express gratitude to Hanumath Maduri from Workday, Zhongxin Ge, Guoxin Kang, and Ryan Zhang for their major contributions to this benchmark’s design and implementation.

For more details, please check out our white paper, Blitzscaling Your Web3 Business with a HTAP Database, to explore how other Web3 companies have dealt with their modern data challenges.

Download Now