{"id":700,"date":"2021-06-30T00:00:00","date_gmt":"2021-06-30T00:00:00","guid":{"rendered":"https:\/\/en.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/"},"modified":"2025-03-04T06:47:41","modified_gmt":"2025-03-04T14:47:41","slug":"how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact","status":"publish","type":"post","link":"https:\/\/www.pingcap.com\/ko\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/","title":{"rendered":"How We Trace a KV Database with Less than 5% Performance Impact"},"content":{"rendered":"<p><a href=\"https:\/\/tikv.org\/\">TiKV<\/a> is a distributed key-value database. It has higher performance requirements than a regular application, so tracing tools must have minimal impact. This article describes <strong>how we achieved tracing all requests&#8217; time consumption in TiKV with less than 5% performance impact<\/strong>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Background_knowledge\"><\/span>Background knowledge<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Logs, metrics, and traces are the three pillars of system observability. The following figure shows their relationship:<\/p>\n<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-25513\" src=\"https:\/\/static.pingcap.com\/files\/2021\/06\/04064724\/logs-metrics-traces-scaled.jpg\" alt=\"logs-metrics-traces\" width=\"2560\" height=\"905\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/04064724\/logs-metrics-traces-scaled.jpg 2560w, https:\/\/static.pingcap.com\/files\/2021\/06\/04064724\/logs-metrics-traces-300x106.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/04064724\/logs-metrics-traces-1024x362.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/04064724\/logs-metrics-traces-768x271.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/04064724\/logs-metrics-traces-1536x543.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/04064724\/logs-metrics-traces-2048x724.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/04064724\/logs-metrics-traces-1440x509.jpg 1440w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/>\n<div class=\"caption-center\">Logs, metrics, and traces<\/div>\n<ul>\n<li>Logs record discrete error and status information.<\/li>\n<li>Metrics record and present aggregatable data.<\/li>\n<li>Traces are a series of events for a single request.<\/li>\n<\/ul>\n<p>TiKV has a complete log and metric system, but lacks traces. Therefore, when we troubleshoot TiKV and TiKV&#8217;s SQL layer, <a href=\"https:\/\/docs.pingcap.com\/tidb\/stable\">TiDB<\/a>, we may encounter these issues:<\/p>\n<ul>\n<li><strong>There is no correlation between observation data.<\/strong> To trace and diagnose the system, you must understand all the operations in a request and their corresponding metrics. Few people have such comprehensive knowledge.<\/li>\n<li><strong>It&#8217;s difficult to trace request jitters.<\/strong> Monitoring metrics such as AVG, P99, and MAX don&#8217;t reflect requests&#8217; performance jitter when request payloads are mixed together.<\/li>\n<\/ul>\n<p><strong>Tracing solves the two problems.<\/strong> Now, I&#8217;ll share in detail how we implemented high-performance tracing in TiKV. It is still an experimental feature in TiKV, and, to open it, you need a specific code branch. If you&#8217;re interested, follow <a href=\"https:\/\/github.com\/tikv\/tikv\/pull\/8981\">Introduce tracing framework (#8981)<\/a> on GitHub.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Basic_concepts\"><\/span>Basic concepts<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Tracing shows the execution path of a request in the system. For example, the following figure is a trace of an SQL statement&#8217;s execution process from TiDB to TiKV:<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2352\" height=\"1078\" class=\"wp-image-701\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/trace-sql-statement-execution.jpg\" alt=\"Trace an SQL statement's execution\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/trace-sql-statement-execution.jpg 2352w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-sql-statement-execution-300x138.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-sql-statement-execution-1024x469.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-sql-statement-execution-768x352.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-sql-statement-execution-1536x704.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-sql-statement-execution-2048x939.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-sql-statement-execution-1440x660.jpg 1440w\" sizes=\"auto, (max-width: 2352px) 100vw, 2352px\" \/>\n<div class=\"caption-center\">Tracing an SQL statement&#8217;s execution<\/div>\n<p>From this figure, we can learn the following information about the <code>INSERT INTO t VALUES (1), (2), (3);<\/code> statement:<\/p>\n<ul>\n<li>When TiDB processes this request, it performs three steps: compiling, planning, and executing.<\/li>\n<li>When TiDB executes this statement, it calls TiKV&#8217;s Prewrite remote procedure call (RPC) and Commit RPC.<\/li>\n<li>The request takes 5 ms.<\/li>\n<\/ul>\n<p>In the figure, a box represents an event called a &#8220;span.&#8221; Each span contains:<\/p>\n<ul>\n<li>An event name<\/li>\n<li>Start and end timestamps<\/li>\n<\/ul>\n<p>Spans are hierarchical, forming a parent-child relationship or a sequence relationship, as shown in the <a href=\"https:\/\/www.jaegertracing.io\/docs\/1.23\/architecture\/\">figure<\/a> below:<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1174\" class=\"wp-image-702\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/span-scaled.jpg\" alt=\"Spans\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/span-scaled.jpg 2560w, https:\/\/static.pingcap.com\/files\/2021\/06\/span-300x138.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/span-1024x469.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/span-768x352.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/span-1536x704.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/span-2048x939.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/span-1440x660.jpg 1440w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/>\n<div class=\"caption-center\">Spans<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Implementation\"><\/span>Implementation<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In this article, unless otherwise specified, we used the following test platform:<\/p>\n<ul>\n<li>CPU: Intel Core i7-8700<\/li>\n<li>Linux distros: Ubuntu 20.04<\/li>\n<li>Linux kernel: 5.4<\/li>\n<li>Memory: 32 GB<\/li>\n<li>Disk: NVM Express<sup>\u00ae<\/sup> (NVMe) SSD<\/li>\n<\/ul>\n<p>TiKV is written in Rust. The Rust community has several off-the-shelf tracing libraries, for example, <a href=\"https:\/\/github.com\/tokio-rs\/tracing\">tokio-tracing<\/a>, <a href=\"https:\/\/github.com\/sile\/rustracing\">rustracing<\/a>, and <a href=\"https:\/\/github.com\/open-telemetry\/opentelemetry-rust\">open-telemetry<\/a>. They all follow the OpenTracing specification, but <strong>their performance is not ideal<\/strong>. We discovered that TiKV&#8217;s performance can be reduced by more than 50% when introducing these libraries.<\/p>\n<p>TiKV&#8217;s <strong>tracing implementation impacts performance by less than 5% by tracing and collecting spans efficiently, which only takes 20 ns<\/strong>.<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2222\" height=\"1102\" class=\"wp-image-703\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/overhead-per-span.jpg\" alt=\"Overhead per span\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/overhead-per-span.jpg 2222w, https:\/\/static.pingcap.com\/files\/2021\/06\/overhead-per-span-300x149.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/overhead-per-span-1024x508.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/overhead-per-span-768x381.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/overhead-per-span-1536x762.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/overhead-per-span-2048x1016.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/overhead-per-span-1440x714.jpg 1440w\" sizes=\"auto, (max-width: 2222px) 100vw, 2222px\" \/>\n<div class=\"caption-center\">Overhead per span<\/div>\n<p>Now let&#8217;s see how it works.<\/p>\n<h3>Time measurements<\/h3>\n<p>Time is measured frequently in tracing. Each span needs two timestamps: the event&#8217;s start and end time. Therefore, timing performance is critical.<\/p>\n<p>Generally, to measure time for tracing, the measurement must ensure:<\/p>\n<ul>\n<li>The timestamp monotonically increases<\/li>\n<li>High performance<\/li>\n<li>High precision<\/li>\n<\/ul>\n<h4><code>std::Instant<\/code><\/h4>\n<p>Rust offers two structures in <code>std<\/code> to measure time:<\/p>\n<ul>\n<li><code>std::SystemTime::now()<\/code>It obtains the current system time. The downside is that the obtained timestamp is not monotonic, for example, when a user manually changes the clock, or there is a NTP service.<\/li>\n<li><code>std::Instant::now()<\/code>It obtains monotonically increasing timestamps with nanosecond precision. Most Rust tracing libraries use it, but its performance is not ideal: it takes 50 ns to obtain timestamps twice. This is one of the reasons why most Rust tracing libraries do not trace efficiently.<\/li>\n<\/ul>\n<h4>Coarse time<\/h4>\n<p>If you need efficient time measurement, coarse time is usually a good choice. Coarse time can be obtained by passing <code>CLOCK_MONOTONIC_COARSE<\/code> to the <code>clock_gettime<\/code> system call in Linux. The Rust community provides the <a href=\"https:\/\/docs.rs\/coarsetime\/0.1.18\/coarsetime\/\"><code>coarsetime<\/code><\/a> crate to read coarse time easily:<\/p>\n<p>{{&lt; copyable &#8220;&#8221; &gt;}}<\/p>\n<pre><code>coarsetime::Instant::now()\n<\/code><\/pre>\n<p>Coarse time is so fast that it takes only 10 ns to obtain two timestamps. Its precision is low though, depending on Linux&#8217;s configuration. The default precision is 4 ms.<\/p>\n<p><strong>In most cases, coarse time can be the first choice<\/strong>, as:<\/p>\n<ul>\n<li>It&#8217;s out of the box in the Linux system and easy to obtain.<\/li>\n<li>For most applications, the 4 ms precision is acceptable.<\/li>\n<\/ul>\n<p>However, in TiKV&#8217;s tracing scenarios, low-precision time measurement can lead to confusing results as shown below:<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"916\" class=\"wp-image-704\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/high-precision-vs-low-precision-time-measurement-scaled.jpg\" alt=\"High precision vs. low precision for time measurement\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/high-precision-vs-low-precision-time-measurement-scaled.jpg 2560w, https:\/\/static.pingcap.com\/files\/2021\/06\/high-precision-vs-low-precision-time-measurement-300x107.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/high-precision-vs-low-precision-time-measurement-1024x367.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/high-precision-vs-low-precision-time-measurement-768x275.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/high-precision-vs-low-precision-time-measurement-1536x550.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/high-precision-vs-low-precision-time-measurement-2048x733.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/high-precision-vs-low-precision-time-measurement-1440x515.jpg 1440w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/>\n<div class=\"caption-center\">High precision vs. low precision for time measurement<\/div>\n<p>To support tracing at microseconds or even nanoseconds precision while keeping performance, we use <strong>Time Stamp Counter (TSC)<\/strong>.<\/p>\n<h4>TSC<\/h4>\n<p>The TSC register has existed in modern x86 architecture CPUs since the Pentium processor introduced in 2003. It records the number of CPU clock cycles since reset. When the CPU clock rate is constant, it can be used for high-precision timing.<\/p>\n<p><strong>The TSC meets the requirements of monotonic increase, high precision, and high performance at the same time.<\/strong> In our test environment, it only takes 15 ns to fetch TSC twice.<\/p>\n<p>The TSC register cannot be used directly. When it&#8217;s used as a time source, it has several problems that affect the accuracy, as listed below.<\/p>\n<h5>TSC increment rate<\/h5>\n<p>TSC increment rate is determined by the CPU frequency. <strong>Modern CPUs may dynamically adjust its frequency<\/strong>, for example, to save energy. As a result, <strong>the TSC increment rate is not constant<\/strong>:<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"1788\" height=\"668\" class=\"wp-image-705\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/tsc-increment-rate.jpg\" alt=\"TSC increment rate\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-increment-rate.jpg 1788w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-increment-rate-300x112.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-increment-rate-1024x383.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-increment-rate-768x287.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-increment-rate-1536x574.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-increment-rate-1440x538.jpg 1440w\" sizes=\"auto, (max-width: 1788px) 100vw, 1788px\" \/>\n<div class=\"caption-center\">TSC increment rate<\/div>\n<p>In addition, <strong>some CPUs do not increase the TSC in sleep mode<\/strong>:<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"1786\" height=\"682\" class=\"wp-image-706\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/tsc-does-not-increment-when-cpu-sleeps.jpg\" alt=\"TSC doesn't increment when the CPU sleeps\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-does-not-increment-when-cpu-sleeps.jpg 1786w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-does-not-increment-when-cpu-sleeps-300x115.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-does-not-increment-when-cpu-sleeps-1024x391.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-does-not-increment-when-cpu-sleeps-768x293.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-does-not-increment-when-cpu-sleeps-1536x587.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-does-not-increment-when-cpu-sleeps-1440x550.jpg 1440w\" sizes=\"auto, (max-width: 1786px) 100vw, 1786px\" \/>\n<div class=\"caption-center\">TSC doesn&#8217;t increment when the CPU sleeps<\/div>\n<p>Some modern x86 architecture CPUs provide features to ensure the TSC increments at a constant rate. In Linux, you can use CPU flags in <code>\/proc\/cpuinfo<\/code> to check whether the TSC increment rate is constant:<\/p>\n<ul>\n<li><code>constant_tsc<\/code>: the TSC increments at the processor&#8217;s nominal frequency instead of instantaneous frequency.<\/li>\n<li><code>nonstop_tsc<\/code>: the TSC increments even when the CPU is in sleep state.<\/li>\n<\/ul>\n<h5>TSC synchronization<\/h5>\n<p>Even if TSC is constant, it cannot be used to measure time. In the x86 architecture, <strong>TSC registers are not guaranteed to be synchronized among all CPU cores<\/strong>. The following figure shows the measured TSC on a laptop produced in 2020 equipped with the latest x64 CPU at that time. As we can see, among the 16 cores, CPU 0&#8217;s TSC value has an offset.<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2142\" height=\"1304\" class=\"wp-image-707\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/tsc-values-for-16-cpu-cores.jpg\" alt=\"TSC values for 16 CPU cores\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-values-for-16-cpu-cores.jpg 2142w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-values-for-16-cpu-cores-300x183.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-values-for-16-cpu-cores-1024x623.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-values-for-16-cpu-cores-768x468.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-values-for-16-cpu-cores-1536x935.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-values-for-16-cpu-cores-2048x1247.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-values-for-16-cpu-cores-1440x877.jpg 1440w\" sizes=\"auto, (max-width: 2142px) 100vw, 2142px\" \/>\n<div class=\"caption-center\">TSC values for 16 CPU cores<\/div>\n<p>Recall that in tracing, each span needs two timestamps, representing the start and end of the event. Due to the thread scheduling, the reads of the two timestamps may happen on different CPU cores. As a result, the delta of the two TSC values can produce incorrect elapsed time <strong>when the TSC is not synchronized between cores.<\/strong><\/p>\n<p>For example:<\/p>\n<ol>\n<li>At t1, the thread is running on Core 1 and reads tsc1.<\/li>\n<li>The operating system schedules the thread from Core 1 to Core 2.<\/li>\n<li>At t2, the thread is running on Core 2 and reads tsc2.<\/li>\n<\/ol>\n<img loading=\"lazy\" decoding=\"async\" width=\"1720\" height=\"696\" class=\"wp-image-708\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/tsc-value-calculation-for-multiple-cores.jpg\" alt=\"TSC value calculation for multiple cores\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-calculation-for-multiple-cores.jpg 1720w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-calculation-for-multiple-cores-300x121.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-calculation-for-multiple-cores-1024x414.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-calculation-for-multiple-cores-768x311.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-calculation-for-multiple-cores-1536x622.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-calculation-for-multiple-cores-1440x583.jpg 1440w\" sizes=\"auto, (max-width: 1720px) 100vw, 1720px\" \/>\n<div class=\"caption-center\">An example of TSC value calculation for multiple cores<\/div>\n<p>In the example above, the elapsed TSC (tsc2 &#8211; tsc1) is even negative!<\/p>\n<p>To solve this problem, <strong>TiKV synchronizes the TSC value of each core by calculating TSC offsets:<\/strong><\/p>\n<ol>\n<li>TiKV arbitrarily takes two TSC values at two physical times on each core.<\/li>\n<li>The physical time is used as the x-axis, and the TSC values are used as the y-axis. They can be used to form a line.<\/li>\n<li>The TSC offset of each core is the gap of the line at y-axis, as shown in the following figure:<\/li>\n<\/ol>\n<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"957\" class=\"wp-image-709\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/tsc-value-offset-scaled.jpg\" alt=\"TSC value offset\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-offset-scaled.jpg 2560w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-offset-300x112.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-offset-1024x383.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-offset-768x287.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-offset-1536x574.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-offset-2048x766.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/tsc-value-offset-1440x538.jpg 1440w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/>\n<div class=\"caption-center\">TSC value offset<\/div>\n<p>When the TSC offset is calculated, the <strong>process of obtaining the TSC values must not be on the same core<\/strong>. In Linux, you can use <code>sched_setaffinity<\/code> to set the thread&#8217;s affinity to make the thread run on a certain core:<\/p>\n<p>{{&lt; copyable &#8220;&#8221; &gt;}}<\/p>\n<pre><code>fn set_affinity(cpuid: usize) -&gt; Result&lt;(), Error&gt; {\n\n  use libc::{cpu_set_t, sched_setaffinity, CPU_SET};\n\n  use std::mem::{size_of, zeroed};\n\n  let mut set = unsafe { zeroed::&lt;cpu_set_t&gt;() };\n\n  unsafe { CPU_SET(cpuid, &amp;mut set) };\n\n  \/\/ Set the current thread's core affinity.\n\n  if unsafe {\n\n      sched_setaffinity(\n\n          0, \/\/ Defaults to current thread\n\n          size_of::&lt;cpu_set_t&gt;(),\n\n          &amp;set as *const _,\n\n      )\n\n  } != 0\n\n  {\n\n      Err(std::io::Error::last_os_error().into())\n\n  } else {\n\n      Ok(())\n\n  }\n\n}\n<\/code><\/pre>\n<p>Now, we know the TSC offset of each CPU core. Next, we need to find out which CPU core we&#8217;re working on. This can be known by <strong>using the RDTSCP command<\/strong>, which helps us atomically get the TSC value and CPU ID at the same time.<\/p>\n<p>The Rust code is as follows:<\/p>\n<p>{{&lt; copyable &#8220;&#8221; &gt;}}<\/p>\n<pre><code>#[cfg(any(target_arch = \"x86\", target_arch = \"x86_64\"))]\n\nfn tsc_with_cpuid() -&gt; (u64, usize) {\n\n  #[cfg(target_arch = \"x86\")]\n\n  use core::arch::x86::__rdtscp;\n\n  #[cfg(target_arch = \"x86_64\")]\n\n  use core::arch::x86_64::__rdtscp;\n\n  let mut aux = std::mem::MaybeUninit::&lt;u32&gt;::uninit();\n\n  let tsc = unsafe { __rdtscp(aux.as_mut_ptr()) };\n\n  let aux = unsafe { aux.assume_init() };\n\n  \/\/ IA32_TSC_AUX are encoded by Linux kernel as follow format:\n\n  \/\/\n\n  \/\/ 31       12 11      0\n\n  \/\/ [ node id ][ cpu id ]\n\n  (tsc, (aux &amp; 0xfff) as usize)\n\n}\n<\/code><\/pre>\n<p>The high-precision and efficient time measurement solution described above has been extracted into a Rust library named <a href=\"https:\/\/github.com\/zhongzc\/minstant.git\">minstant<\/a>, which can be reused for similar requirements.<\/p>\n<h3>Span collection<\/h3>\n<p>Spans may be generated on each thread and finally be collected and aggregated into a trace. Therefore, we need a cross-thread span collection mechanism. <strong>Span collection is a common performance bottleneck of tracing libraries<\/strong>.<\/p>\n<p>Common methods to perform thread-safe span collection are:<\/p>\n<ul>\n<li><code>Arc&lt;Mutex&lt;Vec&lt;Span&gt;&gt;&gt;<\/code><\/li>\n<li><code>std::sync::mpsc::Receiver&lt;Span&gt;<\/code><\/li>\n<li><code>crossbeam::channel::Receiver&lt;Span&gt;<\/code><\/li>\n<\/ul>\n<p>Among these methods, <code>crossbeam::channel<\/code> is the best choice. It takes about 40 ns to send and collect a span. This is not the end. To improve performance, TiKV uses a different method than the above three to collect spans:<\/p>\n<ol>\n<li>For the same thread, TiKV only collects spans on the local thread without any resource contention.<\/li>\n<li>A batch of spans are collected and sent to the global collector.<\/li>\n<\/ol>\n<h4>Local span<\/h4>\n<p>TiKV maintains a thread-local structure <code>LocalSpanLine<\/code> for each thread. It generates and stores <code>LocalSpan<\/code>. Another thread-local structure <code>LocalCollector<\/code> drives <code>LocalSpanLine<\/code> and collects <code>LocalSpan<\/code>. The following figure shows the relationship between these three.<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1105\" class=\"wp-image-710\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/localspanline-in-tikv-scaled.jpg\" alt=\"LocalSpanLine in TiKV\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/localspanline-in-tikv-scaled.jpg 2560w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspanline-in-tikv-300x130.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspanline-in-tikv-1024x442.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspanline-in-tikv-768x332.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspanline-in-tikv-1536x663.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspanline-in-tikv-2048x884.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspanline-in-tikv-1440x622.jpg 1440w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/>\n<div class=\"caption-center\">LocalSpanLine in TiKV<\/div>\n<p>Because <code>LocalSpan<\/code>, <code>LocalSpanLine<\/code>, and <code>LocalCollector<\/code> are all thread-local, they don&#8217;t introduce contentions and <strong>don&#8217;t harm caches<\/strong>. As a result, the performance is extremely high: it only takes 4 ns to push a span into a local vector, as simple as a <code>Vec::push<\/code>.<\/p>\n<p>In addition, when constructing the span hierarchy, thread-local variables can be used to implement the <strong>implicit context<\/strong> mechanism: <strong>users don&#8217;t need to modify the function signature to pass the tracing context<\/strong>, which greatly reduces the intrusion of existing code.<\/p>\n<p>Now, let&#8217;s take a closer look at the implementation details of <code>LocalSpan<\/code> creation and collection.<\/p>\n<p><code>LocalSpanLine<\/code> maintains a container, <code>SpanQueue<\/code>, to load the ongoing or completed <code>LocalSpan<\/code>s. &#8220;Ongoing&#8221; means that the start time of the event indicated by <code>LocalSpan<\/code> is known, but the end time is not. The <code>LocalSpan<\/code>s are stored in the <code>Vec<\/code> structure inside <code>SpanQueue<\/code>.<\/p>\n<p>To construct the parent-child dependency between <code>LocalSpan<\/code>s implicitly, the context passing relies on the <code>next_parent_id<\/code> variable that is maintained by <code>SpanQueue<\/code>.<\/p>\n<p>Next, we will use some examples to describe the process in detail.<\/p>\n<p>Suppose we have a foo event. It occurs at 09:00 and lasts until 09:03:<\/p>\n<p>{{&lt; copyable &#8220;&#8221; &gt;}}<\/p>\n<pre><code>09:00  foo +\n\n09:01      |\n\n09:02      |\n\n09:03      +\n<\/code><\/pre>\n<p><code>SpanQueue<\/code>&#8216;s initial state is empty, and <code>next_parent_id<\/code> is <code>root<\/code>. When foo occurs at 09:00, <code>SpanQueue<\/code> performs the following:<\/p>\n<ol>\n<li>Adds a new record and fills in the event name <code>foo<\/code>, start time <code>09:00<\/code>, and end time <code>empty<\/code>.<\/li>\n<li>Assigns <code>next_parent_id<\/code>&#8216;s value to foo&#8217;s parent.<\/li>\n<li>Updates <code>next_parent_id<\/code> to <code>foo<\/code>.<\/li>\n<li>Returns the index value <code>0<\/code> to receive the notification of the event end and backfill the end time.<\/li>\n<\/ol>\n<table>\n<tbody>\n<tr>\n<td>Index<\/td>\n<td>Event<\/td>\n<td>Start time<\/td>\n<td>End time<\/td>\n<td>Parent<\/td>\n<td>next_parent_id<\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>foo<\/td>\n<td>09:00<\/td>\n<td>N\/A<\/td>\n<td>root<\/td>\n<td>foo<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>At the end of foo (at 09:03), the user submits the index and informs <code>SpanQueue<\/code> that the event is over. <code>SpanQueue<\/code> starts backfilling. It:<\/p>\n<ol>\n<li>Indexes to the record where the foo event is located by the submitted index.<\/li>\n<li>Backfills the end time as <code>09:03<\/code>.<\/li>\n<li>Updates <code>next_parent_id<\/code> to the record&#8217;s parent.<\/li>\n<\/ol>\n<table>\n<tbody>\n<tr>\n<td>Index<\/td>\n<td>Event<\/td>\n<td>Start time<\/td>\n<td>End time<\/td>\n<td>Parent<\/td>\n<td>next_parent_id<\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>foo<\/td>\n<td>09:00<\/td>\n<td>09:03<\/td>\n<td>root<\/td>\n<td>root<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The above example describes a single event&#8217;s recording process. It&#8217;s simple and effective. In fact, recording multiple events is only a repetition of the above process. For example, in the following process, the foo event contains two sub-events: bar and baz.<\/p>\n<p>{{&lt; copyable &#8220;&#8221; &gt;}}<\/p>\n<pre><code>09:00  foo +\n\n09:01      | bar +\n\n09:02      |     |\n\n09:03      |     +\n\n09:04      |\n\n09:05      | baz +\n\n09:06      |     |\n\n09:07      |     +\n\n09:08      +\n<\/code><\/pre>\n<p>As mentioned above, in addition to recording the start and end time of each event, <code>SpanQueue<\/code> also needs to record the parent-child dependency between each event. In the following example, when foo occurs, <code>SpanQueue<\/code>&#8216;s storage content is the same as above. When bar occurs, <code>SpanQueue<\/code> sets the bar&#8217;s parent to <code>next_parent_id<\/code>&#8216;s current value (foo) and and updates <code>next_parent_id<\/code> to bar:<\/p>\n<table>\n<tbody>\n<tr>\n<td>Index<\/td>\n<td>Event<\/td>\n<td>Start time<\/td>\n<td>End time<\/td>\n<td>Parent<\/td>\n<td>next_parent_id<\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>foo<\/td>\n<td>09:00<\/td>\n<td>N\/A<\/td>\n<td>root<\/td>\n<td rowspan=\"2\">bar<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>bar<\/td>\n<td>09:01<\/td>\n<td>N\/A<\/td>\n<td>foo<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>When bar is over, it updates bar&#8217;s end time and the <code>next_parent_id<\/code> variable based on the backfilling steps mentioned above:<\/p>\n<table>\n<tbody>\n<tr>\n<td>Index<\/td>\n<td>Event<\/td>\n<td>Start time<\/td>\n<td>End time<\/td>\n<td>Parent<\/td>\n<td>next_parent_id<\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>foo<\/td>\n<td>09:00<\/td>\n<td>N\/A<\/td>\n<td>root<\/td>\n<td rowspan=\"2\">foo<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>bar<\/td>\n<td>09:01<\/td>\n<td>09:03<\/td>\n<td>foo<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Finally, by repeating the above steps, <code>SpanQueue<\/code> records the information of these three events in an efficient way:<\/p>\n<table>\n<tbody>\n<tr>\n<td>Index<\/td>\n<td>Event<\/td>\n<td>Start time<\/td>\n<td>End time<\/td>\n<td>Parent<\/td>\n<td>next_parent_id<\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>foo<\/td>\n<td>09:00<\/td>\n<td>09:08<\/td>\n<td>root<\/td>\n<td rowspan=\"3\">root<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>bar<\/td>\n<td>09:01<\/td>\n<td>09:03<\/td>\n<td>foo<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>baz<\/td>\n<td>09:05<\/td>\n<td>09:07<\/td>\n<td>foo<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We can connect these records in series to form the following trace tree structure:<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"515\" class=\"wp-image-711\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/trace-tree-structure-scaled.jpg\" alt=\"Trace tree structure\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/trace-tree-structure-scaled.jpg 2560w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-tree-structure-300x60.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-tree-structure-1024x206.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-tree-structure-768x155.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-tree-structure-1536x309.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-tree-structure-2048x412.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-tree-structure-1440x290.jpg 1440w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/>\n<div class=\"caption-center\">Trace tree structure<\/div>\n<h4>Normal span<\/h4>\n<p>Although the recording of <code>LocalSpan<\/code> is efficient, it&#8217;s not flexible enough due to its thread-local implementation. For example, in an asynchronous scenario, the creation and termination of some spans occur in different threads, so that the thread-local implementation doesn&#8217;t work any longer.<\/p>\n<p>For these problems, TiKV retains the thread-safe span recording method described at the beginning of the article: <code>crossbeam::channel<\/code> collects a single span at a time. This is called a <code>NormalSpan<\/code>.<\/p>\n<p>From an implementation point of view, <code>NormalSpan<\/code> information is not recorded in the thread-local container, but is maintained by the corresponding variables to facilitate cross-thread movement. The parent-child relationship between <code>NormalSpan<\/code>s is no longer implicitly constructed by the thread local. You need to specify it manually.<\/p>\n<p>However, <code>NormalSpan<\/code> and <code>LocalSpan<\/code> are not completely isolated. TiKV connects them through the following interactive methods. A set of <code>LocalSpan<\/code>s collected from <code>LocalCollector<\/code> <strong>can be mounted on <code>NormalSpan<\/code> as a subtree,<\/strong> as shown in the following figure. The number of mounts is unlimited. By allowing many-to-many mounting methods, TiKV supports tracing batch scenarios to a certain extent. Most tracing libraries in the community do not offer this.<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2216\" height=\"1344\" class=\"wp-image-712\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/localspan-can-be-mounted-on-normalspan.jpg\" alt=\"LocalSpans can be mounted on NormalSpan\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/localspan-can-be-mounted-on-normalspan.jpg 2216w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspan-can-be-mounted-on-normalspan-300x182.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspan-can-be-mounted-on-normalspan-1024x621.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspan-can-be-mounted-on-normalspan-768x466.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspan-can-be-mounted-on-normalspan-1536x932.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspan-can-be-mounted-on-normalspan-2048x1242.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/localspan-can-be-mounted-on-normalspan-1440x873.jpg 1440w\" sizes=\"auto, (max-width: 2216px) 100vw, 2216px\" \/>\n<div class=\"caption-center\">LocalSpans can be mounted on NormalSpan<\/div>\n<p>The above implementation forms <strong>the fast and slow paths of span collection.<\/strong> They work together to complete the recording of a request&#8217;s execution path information:<\/p>\n<ul>\n<li><code>LocalSpan<\/code> cannot span threads but records efficiently. Collecting <code>LocalSpan<\/code>s in batches and then mounting them to ordinary spans means very low overhead.<\/li>\n<li>The recording of ordinary span is slow, but it can be passed across threads and is more flexible to use.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"How_to_use_the_high-performance_tracing_library\"><\/span>How to use the high-performance tracing library<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We extracted TiKV&#8217;s high-performance tracing logic into an independent library <a href=\"https:\/\/github.com\/tikv\/minitrace-rust\">minitrace-rust<\/a>. To use it, the general steps are:<\/p>\n<ol>\n<li>When the request arrives, create the corresponding root span.<\/li>\n<li>When the request is on the execution path, record the event occurrence.<\/li>\n<li>When the request is complete, collect all spans generated on the execution path.<\/li>\n<\/ol>\n<h3>Create and collect a root span<\/h3>\n<p>Usually, we can create a root span at the beginning of a request:<\/p>\n<p>{{&lt; copyable &#8220;&#8221; &gt;}}<\/p>\n<pre><code>for req in listener.incoming() {\n\n  let (root_span, collector) = Span::root(\"http request\");\n\n  let guard = root_span.enter();\n\n  my_request_handler(req);\n\n}\n<\/code><\/pre>\n<p>By using the &#8220;guard,&#8221; the span can be ended automatically when the scope exits. In addition to returning the root span, <code>Span::root(event)<\/code> also returns a collector. The collector has a one-to-one correspondence with the root span. When the request is completed, we can call the collector&#8217;s collect method to collect all spans generated on the execution path.<\/p>\n<p>{{&lt; copyable &#8220;&#8221; &gt;}}<\/p>\n<pre><code>let (root_span, collector) = Span::root(\"http request\");\n\nlet guard = root_span.enter();\n\nhandle_http_request(req);\n\ndrop((guard, root_span));\n\nlet spans = collector.collect();\n<\/code><\/pre>\n<h3>Event record<\/h3>\n<p>We recommend that you use the trace and the <code>trace_async<\/code> macro provided by minitrace-rust for function-level event recording. This method records the following execution information for a single function:<\/p>\n<ul>\n<li>The call occurring time<\/li>\n<li>The call return time<\/li>\n<li>The direct or indirect caller&#8217;s reference<\/li>\n<li>The reference of the sub-function called directly or indirectly<\/li>\n<\/ul>\n<p>The following example traces two synchronous functions, foo and bar, by adding <code>trace(event)<\/code> as the attribute of these two functions. We can record the function&#8217;s execution information:<\/p>\n<p>{{&lt; copyable &#8220;&#8221; &gt;}}<\/p>\n<pre><code>#[trace(\"foo\")]\n\nfn foo() -&gt; u32 {\n\n  bar();\n\n  42\n\n}\n\n#[trace(\"bar\")]\n\nfn bar() { }\n<\/code><\/pre>\n<p>The final recorded information includes the two functions&#8217; start and end times and the function call relationship: foo calls bar.<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"696\" class=\"wp-image-713\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/foo-calls-bar-scaled.jpg\" alt=\"foo calls bar\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/foo-calls-bar-scaled.jpg 2560w, https:\/\/static.pingcap.com\/files\/2021\/06\/foo-calls-bar-300x82.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/foo-calls-bar-1024x278.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/foo-calls-bar-768x209.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/foo-calls-bar-1536x417.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/foo-calls-bar-2048x557.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/foo-calls-bar-1440x391.jpg 1440w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/>\n<div class=\"caption-center\">foo calls bar<\/div>\n<p>To record asynchronous functions, the steps are slightly different:<\/p>\n<ol>\n<li>Replace trace with <code>trace_async<\/code> as follows:{{&lt; copyable &#8220;&#8221; &gt;}}\n<pre><code>#[trace_async(\"foo async\")]\n\nasync fn foo_aysnc() -&gt; u32 {\n\n  bar_async().await;\n\n  42\n\n}\n\n#[trace_async(\"bar async\")]\n\nasync fn bar_async() {\n\n  yield_now().await;\n\n}\n<\/code><\/pre>\n<\/li>\n<li>Bind the future to a span. Wrap the task with the future adapter <code>in_span<\/code> provided by minitrace-rust.In the Rust asynchronous context, a task refers to the future spawned to an executor, also known as the root future. For example, the following <code>foo_async<\/code> is a task:{{&lt; copyable &#8220;&#8221; &gt;}}\n<pre><code>executor::spawn(\n\n  foo_async()\n\n);\n<\/code><\/pre>\n<p>Suppose you want to trace a task like <code>foo_async<\/code> and bind it to a span created by <code>Span::from_local_parent(event)<\/code>. The application code is as follows:<\/p>\n<p>{{&lt; copyable &#8220;&#8221; &gt;}}<\/p>\n<pre><code>executor::spawn(\n\n  foo_async().in_span(Span::from_local_parent(\"Task: foo_async\"))\n\n);\n<\/code><\/pre>\n<\/li>\n<\/ol>\n<p>The following figure shows the result of the trace:<\/p>\n<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"682\" class=\"wp-image-714\" src=\"https:\/\/en.pingcap.com\/wp-content\/uploads\/2021\/06\/trace-foo-async-task-scaled.jpg\" alt=\"Trace the foo_async task\" srcset=\"https:\/\/static.pingcap.com\/files\/2021\/06\/trace-foo-async-task-scaled.jpg 2560w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-foo-async-task-300x80.jpg 300w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-foo-async-task-1024x273.jpg 1024w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-foo-async-task-768x205.jpg 768w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-foo-async-task-1536x409.jpg 1536w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-foo-async-task-2048x545.jpg 2048w, https:\/\/static.pingcap.com\/files\/2021\/06\/trace-foo-async-task-1440x383.jpg 1440w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/>\n<div class=\"caption-center\">Trace the foo_async task<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>TiKV is a distributed key-value database. Adding an observation feature to TiKV can be difficult, because it has higher performance requirements than an ordinary application. In addition to tracing, TiKV and its upper SQL database TiDB also have other challenging observational requirements. <a href=\"https:\/\/pingcap.com\/\">PingCAP<\/a>&#8216;s Insight team focuses on solving such observation problems and implementing features. If you&#8217;re interested, join the Slack channel <a href=\"https:\/\/slack.tidb.io\/invite?team=tidb-community&amp;channel=sig-diagnosis&amp;ref=tracing\">#sig-diagnosis<\/a> for discussions!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As a key-value database, TiKV has much higher performance requirements than a regular application, so tracing tools must have minimal impact. Learn how we trace TiKV requests while impacting performance less than 5%.<\/p>","protected":false},"author":81,"featured_media":716,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[22],"class_list":["post-700","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-engineering","tag-tikv"],"acf":[],"featured_image_src":"https:\/\/static.pingcap.com\/files\/2021\/06\/key-value-database-tracing-tools-scaled.jpg","author_info":{"display_name":"Zhenchi Zhong","author_link":"https:\/\/www.pingcap.com\/ko\/blog\/author\/zhenchi-zhong\/"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How We Trace a KV Database with Less than 5% Performance Impact | TiDB<\/title>\n<meta name=\"description\" content=\"In this post, we will share how we traced request times in TiKV with under 5% performance impact, which only takes 20 ns.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How We Trace a KV Database with Less than 5% Performance Impact | TiDB\" \/>\n<meta property=\"og:description\" content=\"In this post, we will share how we traced request times in TiKV with under 5% performance impact, which only takes 20 ns.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"article:published_time\" content=\"2021-06-30T00:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-03-04T14:47:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2021\/06\/key-value-database-tracing-tools-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"854\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Zhenchi Zhong\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"Zhenchi Zhong\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"17\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/\"},\"author\":{\"name\":\"Zhenchi Zhong\",\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/#\\\/schema\\\/person\\\/6c3829e8086cb12eb54543cab28ecbee\"},\"headline\":\"How We Trace a KV Database with Less than 5% Performance Impact\",\"datePublished\":\"2021-06-30T00:00:00+00:00\",\"dateModified\":\"2025-03-04T14:47:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/\"},\"wordCount\":2654,\"publisher\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/static.pingcap.com\\\/files\\\/2021\\\/06\\\/key-value-database-tracing-tools-scaled.jpg\",\"keywords\":[\"TiKV\"],\"articleSection\":[\"Engineering\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/\",\"url\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/\",\"name\":\"How We Trace a KV Database with Less than 5% Performance Impact | TiDB\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/static.pingcap.com\\\/files\\\/2021\\\/06\\\/key-value-database-tracing-tools-scaled.jpg\",\"datePublished\":\"2021-06-30T00:00:00+00:00\",\"dateModified\":\"2025-03-04T14:47:41+00:00\",\"description\":\"In this post, we will share how we traced request times in TiKV with under 5% performance impact, which only takes 20 ns.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/#primaryimage\",\"url\":\"https:\\\/\\\/static.pingcap.com\\\/files\\\/2021\\\/06\\\/key-value-database-tracing-tools-scaled.jpg\",\"contentUrl\":\"https:\\\/\\\/static.pingcap.com\\\/files\\\/2021\\\/06\\\/key-value-database-tracing-tools-scaled.jpg\",\"width\":2560,\"height\":854},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/blog\\\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.pingcap.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How We Trace a KV Database with Less than 5% Performance Impact\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/#website\",\"url\":\"https:\\\/\\\/www.pingcap.com\\\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.pingcap.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\\\/\\\/www.pingcap.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/static.pingcap.com\\\/files\\\/2021\\\/11\\\/pingcap-logo.png\",\"contentUrl\":\"https:\\\/\\\/static.pingcap.com\\\/files\\\/2021\\\/11\\\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/facebook.com\\\/pingcap2015\",\"https:\\\/\\\/x.com\\\/PingCAP\",\"https:\\\/\\\/linkedin.com\\\/company\\\/pingcap\",\"https:\\\/\\\/youtube.com\\\/channel\\\/UCuq4puT32DzHKT5rU1IZpIA\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.pingcap.com\\\/#\\\/schema\\\/person\\\/6c3829e8086cb12eb54543cab28ecbee\",\"name\":\"Zhenchi Zhong\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\\\/\\\/static.pingcap.com\\\/files\\\/2022\\\/10\\\/17234942\\\/avatar.jpg\",\"url\":\"https:\\\/\\\/static.pingcap.com\\\/files\\\/2022\\\/10\\\/17234942\\\/avatar.jpg\",\"contentUrl\":\"https:\\\/\\\/static.pingcap.com\\\/files\\\/2022\\\/10\\\/17234942\\\/avatar.jpg\",\"caption\":\"Zhenchi Zhong\"},\"url\":\"https:\\\/\\\/www.pingcap.com\\\/ko\\\/blog\\\/author\\\/zhenchi-zhong\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How We Trace a KV Database with Less than 5% Performance Impact | TiDB","description":"In this post, we will share how we traced request times in TiKV with under 5% performance impact, which only takes 20 ns.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pingcap.com\/ko\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/","og_locale":"ko_KR","og_type":"article","og_title":"How We Trace a KV Database with Less than 5% Performance Impact | TiDB","og_description":"In this post, we will share how we traced request times in TiKV with under 5% performance impact, which only takes 20 ns.","og_url":"https:\/\/www.pingcap.com\/ko\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","article_published_time":"2021-06-30T00:00:00+00:00","article_modified_time":"2025-03-04T14:47:41+00:00","og_image":[{"width":2560,"height":854,"url":"https:\/\/static.pingcap.com\/files\/2021\/06\/key-value-database-tracing-tools-scaled.jpg","type":"image\/jpeg"}],"author":"Zhenchi Zhong","twitter_card":"summary_large_image","twitter_creator":"@PingCAP","twitter_site":"@PingCAP","twitter_misc":{"\uae00\uc4f4\uc774":"Zhenchi Zhong","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"17\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/#article","isPartOf":{"@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/"},"author":{"name":"Zhenchi Zhong","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/6c3829e8086cb12eb54543cab28ecbee"},"headline":"How We Trace a KV Database with Less than 5% Performance Impact","datePublished":"2021-06-30T00:00:00+00:00","dateModified":"2025-03-04T14:47:41+00:00","mainEntityOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/"},"wordCount":2654,"publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2021\/06\/key-value-database-tracing-tools-scaled.jpg","keywords":["TiKV"],"articleSection":["Engineering"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/","url":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/","name":"How We Trace a KV Database with Less than 5% Performance Impact | TiDB","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/#primaryimage"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2021\/06\/key-value-database-tracing-tools-scaled.jpg","datePublished":"2021-06-30T00:00:00+00:00","dateModified":"2025-03-04T14:47:41+00:00","description":"In this post, we will share how we traced request times in TiKV with under 5% performance impact, which only takes 20 ns.","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/#primaryimage","url":"https:\/\/static.pingcap.com\/files\/2021\/06\/key-value-database-tracing-tools-scaled.jpg","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/06\/key-value-database-tracing-tools-scaled.jpg","width":2560,"height":854},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"How We Trace a KV Database with Less than 5% Performance Impact"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]},{"@type":"Person","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/6c3829e8086cb12eb54543cab28ecbee","name":"Zhenchi Zhong","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","url":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","contentUrl":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","caption":"Zhenchi Zhong"},"url":"https:\/\/www.pingcap.com\/ko\/blog\/author\/zhenchi-zhong\/"}]}},"grav_blocks":false,"card_markup":"<a class=\"card-resource bg-white\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/how-we-trace-a-kv-database-with-less-than-5-percent-performance-impact\/\"><div class=\"card-resource__image-container\"><img class=\"card-resource__image\" alt=\"key-value-database-tracing-tools.jpg\" src=\"https:\/\/static.pingcap.com\/files\/2021\/06\/key-value-database-tracing-tools-scaled.jpg\" loading=\"lazy\" width=2560 height=854 \/><\/div><div class=\"card-resource__content-container\"><div class=\"card-resource__content-head\"><div class=\"card-resource__category\">Engineering<\/div><\/div><h5 class=\"card-resource__title\">How We Trace a KV Database with Less than 5% Performance Impact<\/h5><\/div><\/a>","_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/700","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/users\/81"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/comments?post=700"}],"version-history":[{"count":5,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/700\/revisions"}],"predecessor-version":[{"id":25514,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/700\/revisions\/25514"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media\/716"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=700"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/categories?post=700"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tags?post=700"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}