Opera, headquartered in Oslo and listed on NASDAQ, is a global web innovator with an engaged and growing base of hundreds of millions of monthly active users who seek a better internet experience. Today, Opera offers users around the world a range of products and services that include PC and mobile browsers, the newsreader Opera News, and apps dedicated to gaming, e-commerce and classifieds.
Recently, Opera proudly celebrated the two-year anniversary of Opera Ads, which has become a force to be reckoned with – it reaches and engages millions of Opera users worldwide with innovative, content-based Ad experiences from Opera’s global inventory, across its portfolio of products and partner inventories.
This article will share how TiDB, a distributed SQL database, has helped Opera Ads to solve challenges with the Data Management Platform (DMP).
Understanding Opera’s business challenges
Opera Ads initially built their backend storage on MySQL. However, the amount of data to process grew enormously, and the database couldn’t keep up with the requirements. This prompted Opera to seek more reasonable data management solutions for their DMP. The challenges mostly came from two scenarios:
- Audience analysis. In this business scenario, Opera Ads gets an estimated number of targeted audiences through configured tags such as gender, age, country, and language. Based on the estimation, they will know the scope and the number of audiences the advertisement can cover.
- Targeting. In this scenario, Opera Ads leverages user profile data, or first-party signals such as interest and preferences, for advertising targeting. The DMP sorts and analyses the data. Then, Opera Ads updates the target setting to make sure that similar advertisements reach the target audience again and again.
Explosive data volume
The prime challenge with these two scenarios is the exponentially growing data volume that comes along with high-frequency reads and writes. For example, the largest table needs to support billions of rows of data, and the overall data volume can reach 10 TB.
Audience analysis requires both offline and real-time calculation within seconds to identify how many audiences can be covered by the targeting advertising. Based on the calculation results, advertising operators can know whether the advertising configurations are reasonable and the ads are ready to be posted.
Investigating possible solutions
Using their storage and calculation needs as a guide, Opera Ads investigated database solutions such as MongoDB, Hbase, Hbase+ES, and ClickHouse. However, none of them met their expectations. After some further exploration and testing, they finally chose TiDB.
Opera used MongoDB in other projects to store data and create analysis reports. However, they found that as the amount of data increased, the performance quickly dropped. After the number of rows exceeded a billion, the performance could not keep up with Opera Ads requirements for mass storage and real-time calculations.
Opera Ads also considered replacing MongoDB with HBase and storing user data in Hadoop. HBase can support a large amount of data, but it is relatively complex to use and maintain. Troubleshooting issues could be hard as well as you need to examine numerous moving parts.
In addition, they also tried to use ElasticSearch (ES) for real-time calculations. Query via ES requires high memory usage for caching to achieve high performance. Using HBase and ES together would increase their business cost. What they were looking for was a SQL-oriented solution with great storage scalability. Therefore, HBase+ES didn’t fit their requirements as well.
As a popular solution in the ads industry, ClickHouse was tempting for Opera Ads because it has the native support for bitmap, which is used heavily for various real-time calculations in audience analysis. With this support, ClickHouse can directly store the data as a bitmap. If the content management system (CMS) wants to query the intersection or union of the user groups corresponding to the two bitmaps, it can directly read from the underlying database.
However, ClickHouse is designed for Online Analytical Processing (OLAP) scenarios and could not handle data update operations well. At Opera Ads, user data needs to be updated each day. There are more update operations than full insert operations. Therefore, ClickHouse is not a good fit.
The final choice: TiDB
TiDB was initially known by Opera Ads as a “big and better MySQL database with horizontal scalability.” At that time, they were not sure whether TiDB could meet the requirements, so they compared TiDB and MySQL. Although TiDB was not particularly fast, its performance was stable as data volume increased. For example, when the table in the test grew from 1 million rows to 1 billion rows, the overall performance jitter was small. This solved their scalability and performance issues very well.
Another key factor that led Opera Ads to TiDB is that TiDB and bitmap could be well integrated in their business scenarios because they rely heavily on real-time calculations with bitmap. In addition, TiDB is a MySQL compatible database, so it does not add any extra learning burden and operational costs.
From the time Opera Ads started testing TiDB to the actual implementation, it only took about a week.
TiDB deployment and optimization
Opera Ads started with TiDB 2.14 and encountered many challenges. When they installed TiDB, it had a pre-installation script for testing the performance of various machines, including that of SSDs. The script required 40,000 operations per second (OPS) for pure read performance and 10,000 for random read/write. However, the machine at the time had a less than ideal hardware configuration and failed the performance check. They had to install TiDB on reusable virtual machines. After many adjustments, they lowered the pure read OPS to 20,000 and random read/write to 3,000 and passed the check for all machines.
After Opera Ads deployed TiDB, they made some optimizations. For example, when they performed a full table scan with serial scanning, the performance was slow. Their offline calculation needed a full scan of four or five tables that could have billions of rows. The widest table had more than 30 columns. With this heavy workload, the offline calculation would take three or four days. Later, they optimized the performance by doing the scan in high concurrency. For example, they used 32 threads to scan a table with 1 billion rows, and each thread scanned 1/32 of the total. This way, TiDB performed quite well and was stable for the offline calculations. Currently, TiDB is able to keep up with the growing data at Opera Ads.
After choosing TiDB, Opera Ads also decided on architecture in a combination of TiDB, bitmap, and Redis. The audience analysis scenario needs to get the target audience coverage using real-time calculation, while the bitmap-related targeting advertising uses offline calculation.
The figure below shows the data processing architecture for their advertising business. Data from different sources is written to TiDB and then divided into two flows. One flow is to Tag Tools, an offline calculation tool we’ve developed. TiDB writes all the relevant data to this tool. Then, the data is cached offline in Redis to perform offline calculations in the DMP server, where user profile data is provided to the advertising engine. Reading the final data from Redis ensures that the advertising engine can get the data in real-time. There are many sources of user data, and the data transmitted through different channels will eventually be ingested into TiDB, while some data will be written to Redis directly. TiDB stores the full user data and tag data, consisting of a billion users and about 2 TB of data.
Opera Ads DMP architecture
Another TiDB usage is the reached audience calculation. When Ads operators configure ads campaigns in the Opera Ads platform, the estimated number of audiences can be shown instantly, thanks to the real-time calculations based on bitmaps. Each tag in the database corresponds to a bitmap, and each bitmap stores all users under the same tag. Bitmaps are generated offline based on the data from TiDB and are saved in a file. Then, the bitmap file is loaded into the CMS memory for real-time audience calculation. The results must be returned within seconds, if not faster.
The benefits of using TiDB for Opera Ads
Easy to use and maintain
Thanks to the compatibility with MySQL, Opera Ads easily got comfortable with TiDB without any extra learning cost. TiDB also has a very robust community ecosystem, and most of the problems can be solved in time via community support. In addition, PingCAP provided timely technical support during the deployment, operation, and upgrade phases.
Scale out horizontally and elastically with no sharding needed
Another obvious benefit is that DBAs at Opera Ads no longer need to care about the amount of data in a table. When they used MySQL, they had worried all the time about whether the data volume for a table was excessively large, or whether they needed to shard the table. After all, sharding not only affects the developers who write the code but also causes trouble for our DBAs. After Opera Ads put TiDB in production, all these worries were gone.
Real-time calculation for analytic scenarios
Another advantage of TiDB lies in the analysis scenario. Most of the business scenarios of Opera Ads involve analytics based on massive data volume. As data continues to grow, the pressure on read/write performance becomes increasingly high For such scenarios, the disadvantages of MySQL as a standalone database becomes more apparent as calculation takes longer when data scales to a large volume. Finally, MySQL can only do single-point calculations, which means out-of-memory (OOM) failures are inevitable.
TiDB’s read/write performance is quite stable as data grows. With the Massively Parallel Processing (MPP) feature introduced in TiDB 5.0, the performance in our real-time scenarios will be even more enhanced.
With the continuous support of the TiDB community, Opera Ads has upgraded from the initial version of TiDB, 2.14, to 4.0. After upgrading the hardware, Opera Ads plans on a full upgrade to TiDB 5.0, which has improved performance and stability compared to the current version. “We hope to achieve an overall upgrade of the data management platform through TiDB’s Hybrid Transactional and Analytical Processing (HTAP) capability, making our advertising platform more real-time, accurate and efficient”, said Yuanlin Zhou, Senior Engineering Director, at Opera.