What companies are using TiDB in production?

TiDB is trusted by over 3000 global enterprises across a variety of industries, such as financial services, gaming, and e-commerce. Users include Square (US), Shopee (Singapore), and China UnionPay (China).

How is TiDB different from other relational databases like MySQL?

TiDB is a next-generation, distributed relational database that can independently scale both computing and storage capacity by adding new nodes. Unlike traditional relational databases that only scale vertically, TiDB offers horizontal scalability, high availability with automatic failover, HTAP capabilities for both OLTP and OLAP workloads, and MySQL protocol compatibility so you can replace MySQL without changing application code.

What is the relationship between TiDB and TiDB Cloud?

TiDB is an open-source database best suited for organizations that want to run it on-premises or in their own data centers. TiDB Cloud is a fully managed cloud Database-as-a-Service (DBaaS) built on TiDB, with an easy-to-use web-based management console for managing TiDB clusters in mission-critical production environments.

Is TiDB compatible with MySQL?

TiDB is highly compatible with the MySQL protocol and the common features and syntax of MySQL 5.7 and MySQL 8.0. Ecosystem tools for MySQL such as PHPMyAdmin, Navicat, MySQL Workbench, and DBeaver can all be used with TiDB. Some MySQL features are not supported in TiDB due to architectural differences in a distributed system.

What programming languages can I use to work with TiDB?

You can use any programming language supported by the MySQL client or driver, including Java, Go, Python, Ruby, PHP, and more.

How does TiDB support strong consistency?

TiDB implements Snapshot Isolation consistency, delivering REPEATABLE-READ for MySQL compatibility. Data is redundantly copied between TiKV nodes using the Raft consensus algorithm to ensure recoverability in the event of node failure. TiDB uses a replication log and State Machine model — write requests go to a Leader node which replicates the command to Followers as a log, and once the majority of nodes receive the log, it is committed and applied.

Where can I run TiDB?

TiDB is available for bare metal, cloud-based, or hybrid installations. A Kubernetes Operator is available, and you can also use TiUp to quickly deploy a test environment on your laptop or a full production cluster across many nodes.

How does TiDB ensure high availability?

TiDB uses the Raft consensus algorithm to ensure data is highly available and safely replicated throughout storage in Raft Groups. Data is redundantly copied between TiKV nodes across different Availability Zones to protect against machine or data center failure. Automatic failover ensures your service stays online continuously.

What support is available for TiDB customers?

TiDB is supported by a team with experience running mission-critical use cases for over 3000 global enterprises across financial services, e-commerce, enterprise applications, and gaming. 24/7 support is available for TiDB Enterprise Subscription users.

What are PD, TiDB, TiKV, and TiFlash nodes in a TiDB Cluster?

PD (Placement Driver) is the brain of the TiDB cluster, storing metadata and sending data scheduling commands to TiKV nodes. TiDB is the SQL computing layer that aggregates query results and is horizontally scalable. TiKV is the transactional store for OLTP data, maintained in multiple replicas with native high availability. TiFlash is the analytical storage layer that replicates data from TiKV in real-time to support OLAP workloads using columnar storage.

How does TiDB replicate data between TiKV nodes?

TiKV divides the key-value space into key ranges called Regions. Data is distributed across all nodes using Regions as the basic unit, with PD responsible for spreading Regions evenly. TiDB uses the Raft consensus algorithm to replicate data by Regions — multiple replicas of a Region form a Raft Group, and each data change is recorded as a Raft log that is reliably replicated across nodes.

How do I make use of TiDB HTAP capabilities?

As a Hybrid Transactional Analytical Processing (HTAP) database, TiDB automatically replicates data between the OLTP store (TiKV) and OLAP store (TiFlash) in real-time. This eliminates the need for a separate data warehouse and supports real-time analytics on transactional data. Typical HTAP use cases include user personalization, AI recommendations, fraud detection, business intelligence, and real-time reporting.

Is there an easy migration path from another RDBMS to TiDB?

Yes. TiDB provides TiDB Lightning and a Data Migration Tool to migrate data from MySQL databases. Since TiDB implements the MySQL wire protocol, you can use the MySQL client directly. TiKV APIs are also available for Java, Go, Rust, and Python.

What is the difference between TiDB Community Edition and the Enterprise Subscription?

Some features such as audit logging are not included in the Community Edition. The most significant difference is the inclusion of Enterprise Support at the Enterprise Subscription level, providing 24/7 professional support for production environments.

How does TiDB protect data privacy and ensure security?

TiDB includes Transport Layer Security (TLS) and Transparent Data Encryption (TDE) for encryption at rest. It operates across two network planes: one for application-to-TiDB server communication and one for internal data communication. TiDB also supports extended syntax for Subject Alternative Name verification and TLS context for internal communication.

What companies are using TiDB Cloud in production?

TiDB Cloud is trusted by enterprises including Catalyst (US), KNN3 Network (Singapore), and CAPCOM (Japan), alongside thousands of other global organizations across financial services, SaaS, Web3, gaming, and e-commerce.

TiDB Cloud is a fully managed cloud Database-as-a-Service (DBaaS) built on TiDB. It allows developers and DBAs to deploy on Amazon Web Services or Google Cloud through an intuitive console, handling infrastructure management and cluster deployment so teams can focus on building applications. Clusters can be scaled in or out with a simple click.

Is TiDB Cloud compatible with MySQL?

TiDB Cloud is highly compatible with the MySQL protocol and the common features and syntax of MySQL 5.7 and MySQL 8.0. MySQL ecosystem tools including PHPMyAdmin, Navicat, MySQL Workbench, and DBeaver can all be used with TiDB Cloud.

Where can I run TiDB Cloud?

TiDB Cloud is currently available on Amazon Web Services (AWS) and Google Cloud.

How does TiDB Cloud ensure high availability?

TiDB Cloud uses the Raft consensus algorithm to replicate data safely across TiKV nodes in different Availability Zones, protecting against machine or data center failure. As a SaaS provider, PingCAP meets SOC 2 Type 2, ISO 27001, ISO 27701, PCI DSS, GDPR, and HIPAA standards to ensure data security, availability, and confidentiality.

What support is available for TiDB Cloud customers?

TiDB Cloud is supported by the same team behind TiDB, with experience running mission-critical workloads for over 3000 global enterprises. 24/7 support is available for all TiDB Cloud users.

How do I make use of TiDB Cloud HTAP capabilities?

TiDB Cloud automatically replicates data between the OLTP store (TiKV) and OLAP store (TiFlash) in real-time, enabling real-time analytics on transactional data without a separate data pipeline. Typical use cases include AI recommendations, fraud detection, business intelligence, and real-time reporting.

Is there an easy migration path from another RDBMS to TiDB Cloud?

Yes. TiDB provides TiDB Lightning and a Data Migration Tool for migrating from MySQL. TiDB Cloud implements the MySQL wire protocol so existing MySQL clients work directly. TiKV APIs are also available for Java, Go, Rust, and Python.

Privacy-First AI: Building a Voice-to-Text App That Learns Style

I’m a fast talker, but standard tools treat every platform like a dry JIRA ticket. To fix this, I dived into Chrome extension development to create Speak It: a voice-to-text app that learns your style without recording your secrets.

Using privacy-first AI, the system maps a “fingerprint” of your speech—focusing on formality and sentence length—rather than storing raw content. Powered by TiDB vector search, it delivers personalized formatting that satisfies even the pickiest enterprise legal teams by ensuring data is never harvested.

In this blog, I’ll break down how to build a transcription tool that adapts your voice to any platform—from Slack to Gmail—while keeping your data completely off the server. You’ll see the full technical stack as well as the “statistical fingerprinting” logic used to learn personal writing styles without ever storing an actual message.

The Technical Stack: TiDB, Claude, and Deepgram

Here’s what I used and why:

Chrome Extension: The app needs to work on any website, not just one platform. A browser extension was the only way to inject a mic button into Gmail, Slack, Notion, Twitter, and everywhere else.
Web Speech API + Deepgram: Chrome and Edge support the Web Speech API for free. For browsers that don’t (Arc, Safari, Firefox), I fall back to Deepgram’s streaming API. This keeps costs low for most users while maintaining broad compatibility.
TiDB Cloud Starter: I didn’t want to run two databases (one for normal data and one for vectors). Fortunately TiDB can handle both vectors and business data all in one database. It’s also MySQL-compatible, which means I could stick to what I already know AND it scales to zero when idle so I’m not paying for unused capacity.
Claude Sonnet 4: I use Claude Sonnet 4 as the formatting engine. It takes raw transcripts and reformats them based on context and style instructions. It’s great because Sonnet follows constraints well without over-editing (which is extremely important in this context).
OpenAI Embeddings: For embeddings, I use text-embedding-3-small with OpenAI. It generates vector representations of writing style samples. These power the similarity matching for style clustering.

The Architecture: Personalization Without Storing User Content

Here’s how data flows through the system:

[User speaks] 
      ↓
[Deepgram / Web Speech API]
      ↓
[Raw transcript]
      ↓
[Context detection: Gmail? Slack? Twitter?]
      ↓
[Fetch style profile from TiDB]
      ↓
[Claude formats transcript using style + context]
      ↓
[User accepts or rejects suggestion]
      ↓
[Extract stats from accepted text]
      ↓
[Update style profile in TiDB]
      ↓
[Generate embedding for similarity matching]

The key architectural decision was storing stats, not content. Here’s what goes into a style profile:

Field	Type	Example
avg_sentence_length	float	14.2
formality_score	float (0-1)	0.35
uses_contractions	boolean	true
greetings	JSON array	[“Hey”, “Hi there”]
signoffs	JSON array	[“Thanks”, “Cheers”]
top_phrases	JSON array	[“sounds good”, “let me know”]

None of this is the actual message. It’s a fingerprint of how you write, not what you write.

Enterprise customers won’t touch a tool that stores their internal communications. This constraint shaped every design decision.

Implementing Real-Time Context Detection for Gmail, Slack, and X

Different platforms have different norms. For example, LinkedIn tends to be much more formal compared to X. And a Slack message shouldn’t read like an email. So the first thing I did was figure out where the user would be typing.

Context Detection

The extension matches the current URL against known patterns, then looks for platform-specific DOM selectors to find the active text field:

const CONTEXT_PATTERNS = {
  email: {
    urls: [/mail\.google\.com/, /outlook\.live\.com/, /outlook\.office\.com/],
    selectors: [
      '[aria-label="Message Body"]',
      '[role="textbox"][aria-multiline="true"]',
      'div[contenteditable="true"][g_editable="true"]',
    ],
  },
  slack: {
    urls: [/\.slack\.com/],
    selectors: [
      '[data-qa="message_input"]',
      '.ql-editor',
      '[contenteditable="true"][data-message-input]',
    ],
  },
  twitter: {
    urls: [/twitter\.com/, /x\.com/],
    selectors: [
      '[data-testid="tweetTextarea_0"]',
      '[role="textbox"][data-testid]',
    ],
  },
  // ... 20+ contexts total
};

This detection runs before any formatting happens. The detected context determines both how Claude formats the text and what platform-specific instructions it receives.

For example, X (formerly Twitter) formatting keeps things brief and removes formal greetings while email formatting preserves sign-offs and adds paragraph breaks. And Slack sits somewhere in between.

Designing a Privacy-Focused Style Profile Schema in TiDB

The style profile lives in TiDB. Here’s the table structure:

CREATE TABLE user_style_profiles (
  user_id VARCHAR(255) PRIMARY KEY,
  avg_sentence_length FLOAT DEFAULT 12,
  formality_score FLOAT DEFAULT 0.5,
  uses_contractions BOOLEAN DEFAULT TRUE,
  top_phrases JSON,
  greetings JSON,
  signoffs JSON,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

Notice there’s no message_content column. We’re storing how you write, not what you write.

The formality_score ranges from 0 (very casual) to 1 (very formal). This gets calculated from signals like sentence length, punctuation patterns, and word choice. Someone who writes “Hey! Quick question, can u send that over?” scores lower than someone who writes “Good afternoon. I wanted to follow up regarding the materials.”

Fetching a profile is a simple query:

async function getUserStyleProfile(userId: string): Promise<StyleProfile | null> {
  const [rows] = await connection.execute<RowDataPacket[]>(
    `SELECT avg_sentence_length, formality_score, uses_contractions,
            top_phrases, greetings, signoffs
     FROM user_style_profiles WHERE user_id = ?`,
    [userId]
  );
  
  if (rows.length === 0) return null;
  
  const row = rows[0];
  return {
    avg_sentence_length: row.avg_sentence_length || 12,
    formality_score: row.formality_score || 0.5,
    uses_contractions: row.uses_contractions !== false,
    top_phrases: row.top_phrases ? JSON.parse(row.top_phrases) : [],
    greetings: row.greetings ? JSON.parse(row.greetings) : ["Hey"],
    signoffs: row.signoffs ? JSON.parse(row.signoffs) : ["Thanks"],
  };
}

New users get sensible defaults. The profile evolves as they accept or reject formatting suggestions.

Prompt Engineering: Converting Style Statistics into Claude Instructions

The style profile turns into prompt instructions. Claude doesn’t see historical messages, it sees constraints.

function buildStylePrompt(profile: StyleProfile | null, context: string): string {
  if (!profile) {
    return `Format this transcript for ${context}. Keep it natural and conversational.`;
  }

  const formality = profile.formality_score > 0.7 ? "formal" :
                    profile.formality_score < 0.3 ? "casual" : "balanced";

  const contractionNote = profile.uses_contractions
    ? "Use contractions naturally (don't, won't, can't)."
    : "Minimize contractions for a more formal tone.";

  const greetingNote = profile.greetings.length > 0
    ? `Preferred greetings: ${profile.greetings.slice(0, 3).join(", ")}`
    : "";

  const signoffNote = profile.signoffs.length > 0
    ? `Preferred sign-offs: ${profile.signoffs.slice(0, 3).join(", ")}`
    : "";

  return `Format this transcript for ${context}.

User's writing style:
- Tone: ${formality}
- Average sentence length: ~${Math.round(profile.avg_sentence_length)} words
- ${contractionNote}
${greetingNote ? `- ${greetingNote}` : ""}
${signoffNote ? `- ${signoffNote}` : ""}

Rules:
1. ONLY add punctuation and paragraph breaks
2. Remove filler words: um, uh, like, basically, you know
3. Keep EVERY other word exactly as they said it
4. Do NOT rewrite, rephrase, or "clean up" their language`;
}

The rules at the bottom are critical. Without them, Claude will “improve” the user’s words. But people don’t want their voice replaced, they just want it cleaned up. There’s a difference.

Privacy-first AI that keeps a user's words without replace the voice.

Each context also gets platform-specific instructions:

function getContextInstructions(context: string): string {
  switch (context) {
    case "email":
      return `Email format:
- Add punctuation and paragraph breaks
- Keep their exact words
- Add sign-off if missing`;

    case "slack":
      return `Slack format:
- Keep it brief and casual
- No formal greetings needed
- Okay to use shorter sentences`;

    case "twitter":
      return `Twitter/X format:
- Add punctuation only
- Keep their exact words
- If over 280 characters, don't trim`;
    
    // ... more contexts
  }
}

The combination of style profile and context instructions gives Claude enough guidance to format appropriately without overstepping.

The Learning Loop: Using Weighted Averages for Style Adaptation

Here’s the part I’m still iterating on.

When a user accepts or rejects a format suggestion, I want to update their profile. The naive approach was to just overwrite the stats with the new sample.

But, that was wrong.

If someone has been using the app for months and their profile reflects hundreds of accepted formats, a single new sample shouldn’t dramatically shift their stats. New samples need to have less influence as the profile matures.

The solution is weighted averaging. Each new sample contributes a fraction to the running average, with that fraction decreasing over time:

function updateStyleProfile(
  existingProfile: StyleProfile,
  newStats: TextStats,
  sampleCount: number
): StyleProfile {
  // Weight decreases as sample count increases
  // First sample: 100% weight. 100th sample: ~1% weight.
  const weight = 1 / (sampleCount + 1);
  
  return {
    avg_sentence_length: 
      existingProfile.avg_sentence_length * (1 - weight) + 
      newStats.avg_sentence_length * weight,
    formality_score:
      existingProfile.formality_score * (1 - weight) +
      calculateFormality(newStats) * weight,
    // ... other fields
  };
}

For phrases, greetings, and signoffs, I track frequency counts rather than just presence. A greeting you use once shouldn’t rank the same as one you use constantly.

I’m also generating embeddings for each accepted format:

const embeddingResponse = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: `Sentence length: ${stats.avg_sentence_length}. ` +
         `Formality: ${stats.formality_score}. ` +
         `Context: ${context}. ` +
         `Contractions: ${stats.uses_contractions}`,
});
const styleEmbedding = embeddingResponse.data[0].embedding;

The idea here is to cluster similar writing styles together. Users who write like you might have formatting preferences you’d also like. But I’ll be honest: this piece isn’t fully wired up yet. I’m generating the embeddings but not querying them for recommendations.

That’s the next iteration.

The Result: A Cross-Platform, Privacy-First Voice-to-Text App

What works today:

Voice-to-text on 20+ platforms (Gmail, Slack, Notion, Twitter, LinkedIn, GitHub, and more)
Automatic context detection: no manual switching
Style profiles that influence formatting output
Privacy-first design which means statistics only, no content stored

What’s next:

Vector similarity for style clustering (“users who write like you prefer…”)
Refined feedback loop for profile updates
Multi-language support beyond English
Browser support expansion (Firefox add-on)

You can find the code to the free version here.

Open Source and Getting Started: Build Your Own Transcription Tool

The main insight from building this app: personalization doesn’t require surveillance. You can learn patterns without learning secrets. Statistical fingerprints give you enough signal to customize behavior while keeping actual content out of your database entirely.

For enterprise use cases where privacy is non-negotiable, this approach opens doors that content-based learning keeps closed.

If you want to build something similar, TiDB Cloud Starter gives you enough runway to experiment. The combination of relational tables (for user profiles) and vector search (for style similarity) in one database simplified my architecture significantly.

Start for Free