{"id":32819,"date":"2026-04-07T06:00:46","date_gmt":"2026-04-07T13:00:46","guid":{"rendered":"https:\/\/www.pingcap.com\/?p=32819"},"modified":"2026-04-10T11:56:00","modified_gmt":"2026-04-10T18:56:00","slug":"tidb-8-5-reduce-p999-latency-distributed-database","status":"publish","type":"post","link":"https:\/\/www.pingcap.com\/ko\/blog\/tidb-8-5-reduce-p999-latency-distributed-database\/","title":{"rendered":"Reducing P999 Latency in Distributed Databases with TiDB 8.5"},"content":{"rendered":"<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Key_Takeaways\"><\/span><strong>Key Takeaways<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tail latency \u2014 not averages \u2014 is what breaks SLOs in distributed OLTP systems. <\/li>\n\n\n\n<li>Production clusters saw P999 drop from tens of seconds to sub-100ms after upgrading to TiDB 8.5, with no workload changes. <\/li>\n\n\n\n<li>The largest gains came from eliminating rare-but-catastrophic stalls: GC pauses, lock contention, and storage snapshot overhead. <\/li>\n\n\n\n<li>TiDB 8.5 optimizes across the full stack by removing unnecessary work, replacing expensive operations, and reordering tasks off the critical path. <\/li>\n\n\n\n<li>High-QPS, large-scale workloads with strict latency requirements benefit most.<\/li>\n<\/ul>\n<\/blockquote>\n\n\n\n<p>Reducing P999 latency in distributed databases is one of the hardest challenges in modern OLTP systems. A handful of slow requests can cascade across services, break SLOs, and directly impact business outcomes, especially in latency-sensitive environments like trading platforms and real-time applications.<\/p>\n\n\n\n<p>This is the challenge of tail latency. As systems scale, variability compounds: queueing amplifies small delays, fan-out turns rare slow sub-requests into frequent user-facing issues, and hidden bottlenecks across the stack create unpredictable spikes.<\/p>\n\n\n\n<p>In practice, it\u2019s not the median that hurts\u2014it\u2019s the P99 and P999. With TiDB 8.5, we address this at the root\u2014not by improving averages, but by systematically reducing latency variance.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_We_Observed_in_Production\"><\/span>What We Observed in Production<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>TiDB 8.5 is a performance-focused release. In a controlled, in-place production upgrade (same workload and configuration; only the TiDB\/TiKV kernel version changed from v7.5.6 to v8.5.4), we observed a step-change in what matters most for real OLTP services:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Tail latency collapsed<\/strong>: P999 moved from&nbsp;tens of seconds&nbsp;tails down to sub-second, often tens of milliseconds in some windows.&nbsp;<\/li>\n\n\n\n<li><strong>Slow-query pressure dropped<\/strong>: <a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\/identify-slow-queries\/\">Slow query bursts <\/a>reduced by roughly&nbsp;30%\u201390%&nbsp;depending on the time window.<\/li>\n\n\n\n<li><strong>Resource behavior became smoother<\/strong>: <a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\/tikv-overview\/\">TiKV<\/a> CPU usage shifted down by about&nbsp;10%\u201325%&nbsp;on average, with fewer extreme spikes.<\/li>\n<\/ul>\n\n\n\n<p>The following chart summarizes the latency distribution shift we observed across these production clusters. It compares the percentile-latency curve before (TiDB 7.5) and after (TiDB 8.5) the upgrade:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1840\" height=\"995\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/970d1568-3686-42ac-8180-f4d4883954c1.png\" alt=\"\" class=\"wp-image-32822\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/970d1568-3686-42ac-8180-f4d4883954c1.png 1840w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/970d1568-3686-42ac-8180-f4d4883954c1-300x162.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/970d1568-3686-42ac-8180-f4d4883954c1-1024x554.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/970d1568-3686-42ac-8180-f4d4883954c1-768x415.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/970d1568-3686-42ac-8180-f4d4883954c1-1536x831.png 1536w\" sizes=\"auto, (max-width: 1840px) 100vw, 1840px\" \/><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"has-text-align-center\"><em>Fig. 1: The latency improvement in a mixed read\/write OLTP workloads running on large-scale production clusters (100+ TiKV nodes, 4 TiB+ data per store, 150K+ QPS)<\/em>.<\/p>\n<\/blockquote>\n\n\n\n<p>These results are not workload-specific, they reflect systematic improvements in how TiDB handles latency under pressure.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Which Workloads Benefit Most from TiDB 8.5<\/h3>\n\n\n\n<p>To understand where these improvements come from\u2014and whether they apply to your system\u2014it\u2019s important to look at the workload characteristics that trigger tail latency in TiDB.<\/p>\n\n\n\n<p>As the production results show, not every deployment sees the same level of improvement. The gains are most pronounced under specific conditions:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Optimization<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Root Cause<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Best Suited For<\/strong><\/td><\/tr><tr><td>Memory allocation pooling (Remove)<\/td><td>Go runtime GC pauses and goroutine scheduling delays cause sporadic latency spikes on the SQL layer<\/td><td>High QPS (100K+) with many short-lived OLTP queries; Go GC pauses visible in P99; tens of thousands of rows changing per second<\/td><\/tr><tr><td>ART membuffer (Replace)<\/td><td>Red-Black Tree MemDB spends most CPU time on key comparisons; O(log n) comparisons with long common-prefix keys are expensive<\/td><td>Large tables with many indexes; keys sharing long common prefixes (table prefixes, index prefixes); write-heavy DML on wide tables<\/td><\/tr><tr><td>Async snapshot \/ SST mutex (Reorder)<\/td><td>Storage async snapshot duration grows with SST file count; mutex contention blocks foreground writes during metadata operations<\/td><td>Large data volume per TiKV store (e.g., 4 TiB+); high SST file counts (100K\u2013200K+); sustained write pressure alongside latency-sensitive reads<\/td><\/tr><tr><td>gRPC batching \/ TSO parallelizing (Remove)<\/td><td>Coordination overhead with PD and TiKV adds per-query round trips<\/td><td>High fan-out queries; deployments with many TiKV nodes; network latency to PD\/TiKV<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>In practice, the largest gains come from systems that combine high QPS, large data volumes, many indexes, and strict latency SLOs\u2014where small inefficiencies compound into tail latency spikes.<\/p>\n\n\n\n<p>For lower-QPS workloads or smaller datasets, the impact is less dramatic, since tail latency is less dominated by coordination and storage-level stalls. However, reduced baseline overhead still delivers measurable improvements.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_TiDB_85_Reduces_P999_Latency_in_Distributed_Databases\"><\/span><strong>How TiDB 8.5 Reduces P999 Latency in Distributed Databases<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>TiDB 8.5 addresses tail latency challenges through systematic optimizations across all layers of architecture. Rather than incremental tweaks, these are fundamental engineering improvements targeting the root causes of performance variability.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Optimization Strategy<\/h3>\n\n\n\n<p>In a distributed system like TiDB, latency emerges from dependencies across multiple stages\u2014parallel execution, coordination, and wait events across the request path. The question is: How do you systematically reduce it?<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"3364\" height=\"1903\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/f271046c-090f-440a-8d49-f4c73867edb3.png\" alt=\"\" class=\"wp-image-32833\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/f271046c-090f-440a-8d49-f4c73867edb3.png 3364w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/f271046c-090f-440a-8d49-f4c73867edb3-300x170.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/f271046c-090f-440a-8d49-f4c73867edb3-1024x579.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/f271046c-090f-440a-8d49-f4c73867edb3-768x434.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/f271046c-090f-440a-8d49-f4c73867edb3-1536x869.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/f271046c-090f-440a-8d49-f4c73867edb3-2048x1159.png 2048w\" sizes=\"auto, (max-width: 3364px) 100vw, 3364px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 2: Latency is introduced at every stage and effective optimization requires reducing work, round trips, and stalls across the entire stack.<\/em><\/p>\n\n\n\n<p>These improvements are not isolated\u2014they follow a consistent pattern. According to <a href=\"https:\/\/taesoo.kim\/pubs\/2025\/park%3Asysgpt.pdf?\">Serial Performance Optimization (OSDI\u201925)<\/a>, latency can be reduced through three core levers:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Remove<\/strong>&nbsp;tasks from the critical path (fewer waits, fewer round trips)<\/li>\n\n\n\n<li><strong>Replace<\/strong>&nbsp;expensive tasks with cheaper ones (better data structures, less contention)<\/li>\n\n\n\n<li><strong>Reorder<\/strong>&nbsp;tasks to avoid stalls (pooling, pipelining, moving work out of hot locks)<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"3588\" height=\"1209\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/1960bc87-8b29-443c-abc2-5c9569a0003a.png\" alt=\"\" class=\"wp-image-32820\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/1960bc87-8b29-443c-abc2-5c9569a0003a.png 3588w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/1960bc87-8b29-443c-abc2-5c9569a0003a-300x101.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/1960bc87-8b29-443c-abc2-5c9569a0003a-1024x345.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/1960bc87-8b29-443c-abc2-5c9569a0003a-768x259.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/1960bc87-8b29-443c-abc2-5c9569a0003a-1536x518.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081738\/1960bc87-8b29-443c-abc2-5c9569a0003a-2048x690.png 2048w\" sizes=\"auto, (max-width: 3588px) 100vw, 3588px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 3: The three optimization strategies described in Serial Performance Optimization (OSDI\u201925)<\/em>.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th class=\"has-text-align-center\" data-align=\"center\">Strategy<\/th><th>TiDB 8.5 Changes<\/th><th>Impact<\/th><\/tr><\/thead><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Remove<\/strong><\/td><td>Memory pooling, goroutine reuse, reduced RPC coordination<\/td><td>Lower GC and scheduling overhead<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Replace<\/strong><\/td><td>ART replacing Red-Black Tree MemDB<\/td><td>Fewer comparisons, better cache locality<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Reorder<\/strong><\/td><td>Async snapshot + SST mutex optimizations<\/td><td>Eliminates long stalls<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Key Optimization<\/h3>\n\n\n\n<p>The TiDB 8.5 performance work is a combination of these three moves across the TiDB -&gt; TiKV -&gt; <a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\/rocksdb-overview\/\">RocksDB<\/a> path:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">The Removal<\/h4>\n\n\n\n<p><strong>When this matters:<\/strong>&nbsp;The target cluster runs high-QPS OLTP workloads (100K+ QPS) with many short-lived queries. Under these conditions, the cumulative cost of per-query memory allocation, goroutine creation, and Go runtime scheduling\/GC becomes a significant fraction of total query latency.<\/p>\n\n\n\n<p>For TiDB, we typically assume that&nbsp;goroutines&nbsp;and&nbsp;memory&nbsp;are cheap and low-cost resources. However, in <a href=\"https:\/\/www.pingcap.com\/ko\/article\/tidb-transforming-data-management-with-real-time-oltp-olap\/\">OLTP<\/a> scenarios, many queries are sufficiently short that the time spent &#8220;waiting to do the real work&#8221; becomes noticeable. Examples include scheduling wait in the Go runtime and memory garbage collection.<\/p>\n\n\n\n<p>How should we reduce coordination waiting and redundant distributed work by turning multiple small steps into one? TiDB 8.5 introduces several memory allocation optimizations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Goroutine Reuse:<\/strong>&nbsp;Instead of creating new goroutines for each coprocessor request, TiDB avoids starting extra goroutines in certaint coprocessor\/distsql requests.<\/li>\n\n\n\n<li><strong>Reduced Allocations in ExecDetails:<\/strong>&nbsp;Execution detail structures are now pooled and reused rather than allocated per-query.<\/li>\n\n\n\n<li><strong>RuntimeStats Optimization:<\/strong>&nbsp;Runtime statistics collection has been optimized to reduce allocation overhead.<\/li>\n\n\n\n<li><strong>BuildCopIterator Improvements:<\/strong>&nbsp;The coprocessor iterator construction path now uses pre-allocated buffers.<\/li>\n<\/ul>\n\n\n\n<p>For example, memory allocation by&nbsp;<code>handle<\/code>&nbsp;keys (usually the primary key) processing in TiDB. It may seem trivial, but the impact is actually significant.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1632\" height=\"1024\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081741\/6ba0dac2-8747-464c-988c-ae76b0dead1a.png\" alt=\"\" class=\"wp-image-32832\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081741\/6ba0dac2-8747-464c-988c-ae76b0dead1a.png 1632w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081741\/6ba0dac2-8747-464c-988c-ae76b0dead1a-300x188.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081741\/6ba0dac2-8747-464c-988c-ae76b0dead1a-1024x643.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081741\/6ba0dac2-8747-464c-988c-ae76b0dead1a-768x482.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081741\/6ba0dac2-8747-464c-988c-ae76b0dead1a-1536x964.png 1536w\" sizes=\"auto, (max-width: 1632px) 100vw, 1632px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 4: CPU flame graph before and after optimization, highlighting reduced allocation overhead on the critical path<\/em><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">The Replacement<\/h4>\n\n\n\n<p><strong>When this matters:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If there are many large transactions or large batch size DML executions in the system, it means there would be a lot of&nbsp;<code>MemDB<\/code>&nbsp;read\/write operations during the execution, in which the Red-Black Tree does not perform well.<\/li>\n\n\n\n<li>The user tables have many indexes, or user keys share long common prefixes (which is the norm in TiDB \u2014 all keys for a table start with&nbsp;<code>t{tableID}_r<\/code>&nbsp;\ub610\ub294&nbsp;<code>t{tableID}_i{indexID}<\/code>). The longer the common prefix, the more expensive each Red-Black Tree comparison becomes, and the more CPU cycles are wasted on redundant byte comparisons.&nbsp;<\/li>\n<\/ul>\n\n\n\n<p>Make hot-path data structures and algorithms cache-friendly. Some performance wins are not about distributed execution at all\u2014they come from replacing a core in-memory mechanism with one that matches the workload\u2019s key shapes. In TiDB, transactional workloads often involve keys with long common prefixes. Replacing comparison-heavy structures with prefix-friendly ones (e.g., radix-tree style indexing for transactional mem-buffers) reduces CPU cycles per mutation and improves cache locality. The result is higher throughput and, importantly, less latency jitter under load.<\/p>\n\n\n\n<p>The <a href=\"https:\/\/docslib.org\/doc\/6727469\/the-adaptive-radix-tree-artful-indexing-for-main-memory-databases?\">&#8220;comparison problem&#8221; of Red-Black Tree based MemDB in TiDB<\/a>:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1559\" height=\"873\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/1b9e7e0b-aad7-48ea-a15d-b817e10ca4e6.png\" alt=\"\" class=\"wp-image-32823\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/1b9e7e0b-aad7-48ea-a15d-b817e10ca4e6.png 1559w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/1b9e7e0b-aad7-48ea-a15d-b817e10ca4e6-300x168.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/1b9e7e0b-aad7-48ea-a15d-b817e10ca4e6-1024x573.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/1b9e7e0b-aad7-48ea-a15d-b817e10ca4e6-768x430.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/1b9e7e0b-aad7-48ea-a15d-b817e10ca4e6-1536x860.png 1536w\" sizes=\"auto, (max-width: 1559px) 100vw, 1559px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 5: Red-Black Tree MemDB CPU profile showing comparison-heavy hot path<\/em><\/p>\n\n\n\n<p>TiDB 8.5 replaces the Red-Black Tree with ART as the default membuffer. ART is a&nbsp;radix tree-based in-memory index&nbsp;that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Provides O(k) lookup complexity where k is key length,&nbsp;independent of the number of keys.<\/li>\n\n\n\n<li>Is particularly&nbsp;efficient for keys with long common prefixes&nbsp;(e.g., table prefixes, index prefixes).<\/li>\n\n\n\n<li>Offers better cache locality compared to pointer-based tree structures.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2243\" height=\"1688\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/e358d934-0813-4fbd-a0b3-27aaf3393031.png\" alt=\"\" class=\"wp-image-32828\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/e358d934-0813-4fbd-a0b3-27aaf3393031.png 2243w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/e358d934-0813-4fbd-a0b3-27aaf3393031-300x226.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/e358d934-0813-4fbd-a0b3-27aaf3393031-1024x771.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/e358d934-0813-4fbd-a0b3-27aaf3393031-768x578.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/e358d934-0813-4fbd-a0b3-27aaf3393031-1536x1156.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/e358d934-0813-4fbd-a0b3-27aaf3393031-2048x1541.png 2048w\" sizes=\"auto, (max-width: 2243px) 100vw, 2243px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 6: ART MemDB reduces DML execution time across workloads compared to Red-Black Tree<\/em><\/p>\n\n\n\n<p>From the copy table(&#8220;insert into select * from&#8221;), update_non_index, update_index, delete benchmark tests using sysbench schema, ART-based memdb all shows better performance.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">The Reorder<\/h4>\n\n\n\n<p><strong>When this matters:<\/strong>&nbsp;The TiKV stores in the target cluster hold large data volumes (multiple TB per node), resulting in high SST file counts (100K\u2013200K). Under these conditions, storage metadata operations (like acquiring async snapshots) hold mutexes for durations that scale with file count, blocking foreground read and write operations. This is the primary cause of the rare-but-catastrophic multi-second latency spikes observed at P99.9+.<\/p>\n\n\n\n<p>The easiest way to get minutes-long tails is to occasionally hit a \u201cstop-the-world\u201d style stall: Stack growth, goroutine churn, lock contention, or storage-engine critical sections that grow with data volume. TiDB 8.5 attacks this by making the system more predictable:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pooling\/reuse instead of per-request goroutine churn on common executor paths.&nbsp;<\/li>\n\n\n\n<li>Move work out of contended locks (especially inside storage-engine metadata updates that can scale with SST file count).&nbsp;<\/li>\n\n\n\n<li>Avoid pausing foreground work during operational tasks (for example, reduce latency impact from SST ingestion by allowing safe concurrent writes and using latching where correctness needs it). Even when these changes look \u201csmall\u201d in microbenchmarks, they disproportionately improve P99\/P999 because they remove rare-but-catastrophic stalls.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2270\" height=\"1751\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/cce70a4d-65c9-4f54-ba1a-c6ef3555815e.png\" alt=\"\" class=\"wp-image-32827\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/cce70a4d-65c9-4f54-ba1a-c6ef3555815e.png 2270w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/cce70a4d-65c9-4f54-ba1a-c6ef3555815e-300x231.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/cce70a4d-65c9-4f54-ba1a-c6ef3555815e-1024x790.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/cce70a4d-65c9-4f54-ba1a-c6ef3555815e-768x592.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/cce70a4d-65c9-4f54-ba1a-c6ef3555815e-1536x1185.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/cce70a4d-65c9-4f54-ba1a-c6ef3555815e-2048x1580.png 2048w\" sizes=\"auto, (max-width: 2270px) 100vw, 2270px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 7: As the number of SST files increases, the async snapshot duration&#8217;s tail latency is getting higher, which causes performance issues under the large data volume case.<\/em><\/p>\n\n\n\n<p>The improvements in RocksDB are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Make&nbsp;<code>VersionStorageInfo<\/code>&nbsp;be a pointer, so we can free it in the background thread. Previously, freeing this structure blocked the mutex, causing foreground writes to stall when SST file counts were high.<\/li>\n\n\n\n<li>Move the generation of&nbsp;<code>file_locations<\/code>&nbsp;of&nbsp;<code>SaveTo<\/code>&nbsp;into&nbsp;<code>PrepareApply<\/code>&nbsp;which is out of mutex.<\/li>\n<\/ul>\n\n\n\n<p>The results are as follows:&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2753\" height=\"1392\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/909e0686-8ffb-46fe-a37a-31dab925fbb5.png\" alt=\"\" class=\"wp-image-32831\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/909e0686-8ffb-46fe-a37a-31dab925fbb5.png 2753w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/909e0686-8ffb-46fe-a37a-31dab925fbb5-300x152.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/909e0686-8ffb-46fe-a37a-31dab925fbb5-1024x518.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/909e0686-8ffb-46fe-a37a-31dab925fbb5-768x388.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/909e0686-8ffb-46fe-a37a-31dab925fbb5-1536x777.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/909e0686-8ffb-46fe-a37a-31dab925fbb5-2048x1036.png 2048w\" sizes=\"auto, (max-width: 2753px) 100vw, 2753px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2747\" height=\"1404\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081742\/5134370c-a40a-41e7-b4f4-600fc0aa6a1f.png\" alt=\"\" class=\"wp-image-32825\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081742\/5134370c-a40a-41e7-b4f4-600fc0aa6a1f.png 2747w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081742\/5134370c-a40a-41e7-b4f4-600fc0aa6a1f-300x153.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081742\/5134370c-a40a-41e7-b4f4-600fc0aa6a1f-1024x523.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081742\/5134370c-a40a-41e7-b4f4-600fc0aa6a1f-768x393.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081742\/5134370c-a40a-41e7-b4f4-600fc0aa6a1f-1536x785.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081742\/5134370c-a40a-41e7-b4f4-600fc0aa6a1f-2048x1047.png 2048w\" sizes=\"auto, (max-width: 2747px) 100vw, 2747px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 8: Storage async snapshot duration before and after upgrading the TiDB 8.5.4<\/em>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">System-Wide Improvements in TiDB 8.5<\/h3>\n\n\n\n<p>The improvements are not incremental tweaks but fundamental engineering enhancements that address the root causes of performance variability. Key Engineering Improvements:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>TiDB Layer:<\/strong>&nbsp;Memory allocation\/Goroutine optimization, Coprocessor\/DistSQL worker optimization, ART membuffer, for additional improvements, see the release notes.<\/li>\n\n\n\n<li><strong>TiKV Layer:<\/strong>&nbsp;Async snapshot optimization, SST ingestion without write pauses, for additional improvements, see the release notes.<\/li>\n<\/ul>\n\n\n\n<p>TiDB v8.5 is not &#8220;one magic feature&#8221;. It&#8217;s the result of applying a consistent performance strategy across the stack. More importantly, these optimizations and improvements are&nbsp;enabled by default&nbsp;in the TiDB v8.5.4 kernel, allowing the vast majority of OLTP scenarios to benefit from them.&nbsp;<\/p>\n\n\n\n<p>We now turn to production data to see these improvements in action.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"TiDB_854_Production_Results\"><\/span>TiDB 8.5.4 Production Results<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The true measure of any optimization is its impact in production environments. These results show how TiDB 8.5 can reduce P999 latency in distributed databases under real production workloads. All three serve mission-critical online trading services \u2014 workloads where tail latency directly affects trading success rates and business outcomes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Production Cluster Profiles<\/h3>\n\n\n\n<p>Understanding these results requires understanding the workloads. Here is a summary of the three production clusters:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\"><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Case 1 (150K QPS)<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Case 2 (155K QPS)<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>Case 3 (31K QPS)<\/strong><\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Environment<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\">20+ TiDB\/TiKV\/PD 32C nodes<br>24h Production metric<\/td><td class=\"has-text-align-center\" data-align=\"center\">20+ TiDB\/TiKV\/PD 32C nodes<br>24h Production metric<\/td><td class=\"has-text-align-center\" data-align=\"center\">10+ TiDB\/TiKV\/PD 32C nodes<br>24h Production metric<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Workload type<\/strong><\/td><td>Mixed read\/write, online trading platform<\/td><td>Mixed read\/write, online saas services<\/td><td>Mixed read\/write, mission-critical online services<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Read\/write ratio<\/strong><\/td><td>Both heavy \u2014 tens of thousands of rows changed or updated per second<\/td><td>Both heavy \u2014 similar intensity to Case 1<\/td><td>Moderate load, but with similar latency-sensitive access patterns<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Data volume per TiKV store<\/strong><\/td><td>~4 TiB<\/td><td>~4 TiB<\/td><td>Smaller, but non-trivial<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>SST file count per node<\/strong><\/td><td>~200K<\/td><td>~200K<\/td><td>Lower<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Cluster scale<\/strong><\/td><td>100+ TiKV nodes, ~10K region peers per store<\/td><td>100+ TiKV nodes, ~10K region peers per store<\/td><td>Smaller cluster<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Transaction pattern<\/strong><\/td><td>Standard OLTP transactions<\/td><td>Large transactions (non-transactional DML batches with 200K+ rows per batch)<\/td><td>Standard OLTP transactions<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Latency sensitivity<\/strong><\/td><td>Critical \u2014 read latency spikes directly affect trading success ratio<\/td><td>Critical \u2014 same business impact<\/td><td>Critical \u2014 same business impact<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Why the improvements vary across cases:<\/strong>&nbsp;Case 1 and Case 2 are large-scale, high-QPS clusters where all the optimizations are active simultaneously. Case 3 has lower QPS and a smaller cluster, so the absolute latency numbers were already better \u2014 but the&nbsp;<em>relative<\/em>&nbsp;slow query reduction is the most dramatic.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">P999 Latency Improvements<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2266\" height=\"1692\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/68554be8-be27-4758-965a-23bc239432d4.png\" alt=\"\" class=\"wp-image-32826\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/68554be8-be27-4758-965a-23bc239432d4.png 2266w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/68554be8-be27-4758-965a-23bc239432d4-300x224.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/68554be8-be27-4758-965a-23bc239432d4-1024x765.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/68554be8-be27-4758-965a-23bc239432d4-768x573.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/68554be8-be27-4758-965a-23bc239432d4-1536x1147.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081745\/68554be8-be27-4758-965a-23bc239432d4-2048x1529.png 2048w\" sizes=\"auto, (max-width: 2266px) 100vw, 2266px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 9: P999 Latency reduction from v7.5 to v8.5<\/em><\/p>\n\n\n\n<p>In one production cluster, P999 dropped from minute-level to sub-100ms.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Resource Efficiency Gains<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2724\" height=\"1929\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/f4d9277b-05c7-4026-9728-eb0a2b789235.png\" alt=\"\" class=\"wp-image-32829\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/f4d9277b-05c7-4026-9728-eb0a2b789235.png 2724w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/f4d9277b-05c7-4026-9728-eb0a2b789235-300x212.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/f4d9277b-05c7-4026-9728-eb0a2b789235-1024x725.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/f4d9277b-05c7-4026-9728-eb0a2b789235-768x544.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/f4d9277b-05c7-4026-9728-eb0a2b789235-1536x1088.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/f4d9277b-05c7-4026-9728-eb0a2b789235-2048x1450.png 2048w\" sizes=\"auto, (max-width: 2724px) 100vw, 2724px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 10: Comparison between average and peak TiKV CPU usage in v7.5 and v8.5 <\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Slow Query Elimination<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2400\" height=\"1536\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081740\/08bf994f-2d9c-4bd9-afc5-b79901498684.png\" alt=\"\" class=\"wp-image-32824\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081740\/08bf994f-2d9c-4bd9-afc5-b79901498684.png 2400w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081740\/08bf994f-2d9c-4bd9-afc5-b79901498684-300x192.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081740\/08bf994f-2d9c-4bd9-afc5-b79901498684-1024x655.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081740\/08bf994f-2d9c-4bd9-afc5-b79901498684-768x492.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081740\/08bf994f-2d9c-4bd9-afc5-b79901498684-1536x983.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081740\/08bf994f-2d9c-4bd9-afc5-b79901498684-2048x1311.png 2048w\" sizes=\"auto, (max-width: 2400px) 100vw, 2400px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 11: Slow query reduction across the 3 cases and two versions <\/em><\/p>\n\n\n\n<p><strong>&gt;90%&nbsp;<\/strong>Slow Query Reduction (Case 3).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">DML Operation Performance<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2400\" height=\"1360\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/9d439ecb-839e-4357-9a71-005e530c07e8.png\" alt=\"\" class=\"wp-image-32821\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/9d439ecb-839e-4357-9a71-005e530c07e8.png 2400w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/9d439ecb-839e-4357-9a71-005e530c07e8-300x170.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/9d439ecb-839e-4357-9a71-005e530c07e8-1024x580.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/9d439ecb-839e-4357-9a71-005e530c07e8-768x435.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/9d439ecb-839e-4357-9a71-005e530c07e8-1536x870.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081739\/9d439ecb-839e-4357-9a71-005e530c07e8-2048x1161.png 2048w\" sizes=\"auto, (max-width: 2400px) 100vw, 2400px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 12: DML P999 latency comparison across both versions <\/em>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Extending_Tail_Latency_Improvements_in_TiDB_855\"><\/span>Extending Tail Latency Improvements in TiDB 8.5.5<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The improvements in TiDB 8.5 focus on eliminating latency variance by removing stalls and reducing overhead across the request path.<\/p>\n\n\n\n<p>In TiDB 8.5.5, we build on this foundation\u2014not by introducing new sources of variance reduction, but by further shortening the critical path through better execution locality and fewer network round trips.<\/p>\n\n\n\n<p>These enhancements follow the same principles outlined earlier: primarily&nbsp;<strong>Remove<\/strong>&nbsp;(eliminating unnecessary coordination) and&nbsp;<strong>Reorder<\/strong>&nbsp;(moving work closer to where data resides).<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2724\" height=\"1676\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/1b91a68f-de47-46cc-b9b6-7238ceb79422.png\" alt=\"\" class=\"wp-image-32830\" srcset=\"https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/1b91a68f-de47-46cc-b9b6-7238ceb79422.png 2724w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/1b91a68f-de47-46cc-b9b6-7238ceb79422-300x185.png 300w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/1b91a68f-de47-46cc-b9b6-7238ceb79422-1024x630.png 1024w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/1b91a68f-de47-46cc-b9b6-7238ceb79422-768x473.png 768w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/1b91a68f-de47-46cc-b9b6-7238ceb79422-1536x945.png 1536w, https:\/\/static.pingcap.com\/files\/2026\/04\/01081744\/1b91a68f-de47-46cc-b9b6-7238ceb79422-2048x1260.png 2048w\" sizes=\"auto, (max-width: 2724px) 100vw, 2724px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Fig. 13: Cutting index lookup from two network round trips to one through pushdown execution and data locality<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Index Lookup Pushdown<\/h3>\n\n\n\n<p>When index and table data are co-located, an <a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\/optimizer-hints\/#index_lookup_pushdownt1_name-idx1_name--idx2_name--new-in-v855\">index lookup<\/a> can now be executed in a single coprocessor RPC instead of two.<\/p>\n\n\n\n<p>This removes an entire network round trip from the critical path, reducing both latency and coordination overhead for lookup-heavy queries.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data Affinity Scheduling<\/h3>\n\n\n\n<p><a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\/table-affinity\/\">Data affinity scheduling<\/a> increases the likelihood that related data\u2014such as table rows and their corresponding index entries, or partition-level working sets\u2014remains co-located within the same TiKV node.<\/p>\n\n\n\n<p>This improves pushdown hit rates and enables more queries to execute with fewer coordination steps, further reducing latency under load.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Impact<\/h3>\n\n\n\n<p>In suitable workloads, these optimizations provide additional improvements on top of the gains from TiDB 8.5. Internal testing shows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Up to 20\u201330% further tail latency reduction<\/li>\n\n\n\n<li>Up to 20% improvement in TPMC benchmarks<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/tidbcloud.com\/free-trial\/\">Start with TiDB 8.5 today<\/a><em> and see how far you can push OLTP performance with upcoming TiDB 8.5.5 enhancements.<\/em><\/p>","protected":false},"excerpt":{"rendered":"<p>Reducing P999 latency in distributed databases is one of the hardest challenges in modern OLTP systems. A handful of slow requests can cascade across services, break SLOs, and directly impact business outcomes, especially in latency-sensitive environments like trading platforms and real-time applications. This is the challenge of tail latency. As systems scale, variability compounds: queueing [&hellip;]<\/p>\n","protected":false},"author":341,"featured_media":32912,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ub_ctt_via":"","footnotes":""},"categories":[6],"tags":[28,147,9,111,22],"class_list":["post-32819","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-engineering","tag-architecture","tag-distributed-sql","tag-scalability","tag-tidb","tag-tikv"],"acf":[],"featured_image_src":"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png","author_info":{"display_name":"Rui Xu","author_link":"https:\/\/www.pingcap.com\/ko\/blog\/author\/rui-xu\/"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Reducing P999 Latency in Distributed Databases with TiDB 8.5<\/title>\n<meta name=\"description\" content=\"Reduce P999 latency in distributed databases with TiDB 8.5\u2014backed by real production results and system-level optimizations.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Reducing P999 Latency in Distributed Databases with TiDB 8.5\" \/>\n<meta property=\"og:description\" content=\"Reduce P999 latency in distributed databases with TiDB 8.5\u2014backed by real production results and system-level optimizations.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"article:published_time\" content=\"2026-04-07T13:00:46+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-10T18:56:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png\" \/>\n\t<meta property=\"og:image:width\" content=\"3751\" \/>\n\t<meta property=\"og:image:height\" content=\"1251\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Rui Xu\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/static.pingcap.com\/files\/2026\/03\/27132746\/tidb_twitter_1600x900-4.png\" \/>\n<meta name=\"twitter:creator\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rui Xu\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/tidb-8-5-reduce-p999-latency-distributed-database\/\"},\"author\":{\"name\":\"Rui Xu\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/9dfc05f59f5009f160edb4c979276ea1\"},\"headline\":\"Reducing P999 Latency in Distributed Databases with TiDB 8.5\",\"datePublished\":\"2026-04-07T13:00:46+00:00\",\"dateModified\":\"2026-04-10T18:56:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/tidb-8-5-reduce-p999-latency-distributed-database\/\"},\"wordCount\":2385,\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png\",\"keywords\":[\"Architecture\",\"Distributed SQL\",\"Scalability\",\"TiDB\",\"TiKV\"],\"articleSection\":[\"Engineering\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/tidb-8-5-reduce-p999-latency-distributed-database\/\",\"url\":\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/\",\"name\":\"Reducing P999 Latency in Distributed Databases with TiDB 8.5\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png\",\"datePublished\":\"2026-04-07T13:00:46+00:00\",\"dateModified\":\"2026-04-10T18:56:00+00:00\",\"description\":\"Reduce P999 latency in distributed databases with TiDB 8.5\u2014backed by real production results and system-level optimizations.\",\"breadcrumb\":{\"@id\":\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#primaryimage\",\"url\":\"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png\",\"width\":3751,\"height\":1251},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Reducing P999 Latency in Distributed Databases with TiDB 8.5\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/9dfc05f59f5009f160edb4c979276ea1\",\"name\":\"Rui Xu\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"caption\":\"Rui Xu\"},\"url\":\"https:\/\/www.pingcap.com\/ko\/blog\/author\/rui-xu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Reducing P999 Latency in Distributed Databases with TiDB 8.5","description":"Reduce P999 latency in distributed databases with TiDB 8.5\u2014backed by real production results and system-level optimizations.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/","og_locale":"ko_KR","og_type":"article","og_title":"Reducing P999 Latency in Distributed Databases with TiDB 8.5","og_description":"Reduce P999 latency in distributed databases with TiDB 8.5\u2014backed by real production results and system-level optimizations.","og_url":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","article_published_time":"2026-04-07T13:00:46+00:00","article_modified_time":"2026-04-10T18:56:00+00:00","og_image":[{"width":3751,"height":1251,"url":"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png","type":"image\/png"}],"author":"Rui Xu","twitter_card":"summary_large_image","twitter_image":"https:\/\/static.pingcap.com\/files\/2026\/03\/27132746\/tidb_twitter_1600x900-4.png","twitter_creator":"@PingCAP","twitter_site":"@PingCAP","twitter_misc":{"Written by":"Rui Xu","Est. reading time":"12\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#article","isPartOf":{"@id":"https:\/\/www.pingcap.com\/blog\/tidb-8-5-reduce-p999-latency-distributed-database\/"},"author":{"name":"Rui Xu","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/9dfc05f59f5009f160edb4c979276ea1"},"headline":"Reducing P999 Latency in Distributed Databases with TiDB 8.5","datePublished":"2026-04-07T13:00:46+00:00","dateModified":"2026-04-10T18:56:00+00:00","mainEntityOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/tidb-8-5-reduce-p999-latency-distributed-database\/"},"wordCount":2385,"publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"image":{"@id":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png","keywords":["Architecture","Distributed SQL","Scalability","TiDB","TiKV"],"articleSection":["Engineering"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/blog\/tidb-8-5-reduce-p999-latency-distributed-database\/","url":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/","name":"Reducing P999 Latency in Distributed Databases with TiDB 8.5","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#primaryimage"},"image":{"@id":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png","datePublished":"2026-04-07T13:00:46+00:00","dateModified":"2026-04-10T18:56:00+00:00","description":"Reduce P999 latency in distributed databases with TiDB 8.5\u2014backed by real production results and system-level optimizations.","breadcrumb":{"@id":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#primaryimage","url":"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png","width":3751,"height":1251},{"@type":"BreadcrumbList","@id":"https:\/\/thenewstack.io\/4-data-architecture-decisions-that-make-or-break-agentic-systems\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"Reducing P999 Latency in Distributed Databases with TiDB 8.5"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]},{"@type":"Person","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/9dfc05f59f5009f160edb4c979276ea1","name":"Rui Xu","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/","url":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","contentUrl":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","caption":"Rui Xu"},"url":"https:\/\/www.pingcap.com\/ko\/blog\/author\/rui-xu\/"}]}},"grav_blocks":false,"card_markup":"<a class=\"card-resource bg-white\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/tidb-8-5-reduce-p999-latency-distributed-database\/\"><div class=\"card-resource__image-container\"><img class=\"card-resource__image\" alt=\"20260407-162905\" src=\"https:\/\/static.pingcap.com\/files\/2026\/04\/07040029\/20260407-162905.png\" loading=\"lazy\" width=3751 height=1251 \/><\/div><div class=\"card-resource__content-container\"><div class=\"card-resource__content-head\"><div class=\"card-resource__category\">Engineering<\/div><\/div><h5 class=\"card-resource__title\">Reducing P999 Latency in Distributed Databases with TiDB 8.5<\/h5><\/div><\/a>","_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/32819","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/users\/341"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/comments?post=32819"}],"version-history":[{"count":15,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/32819\/revisions"}],"predecessor-version":[{"id":32934,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/32819\/revisions\/32934"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media\/32912"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=32819"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/categories?post=32819"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tags?post=32819"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}