What an AI Harness Actually Needs Beyond a Model

Key Takeaways

The harness, not the model, determines whether agent state is portable and recoverable.
Memory is runtime state that belongs in the architecture from the start, not bolted on later.
State lock-in is the real risk: switching frameworks can mean losing the agent’s entire operating history.
Framework migrations and task interruptions both trace back to the same root cause: state stored in the wrong place.

For the first wave of AI applications, the model was the only architectural question that mattered. Which model reasons better. Which one writes cleaner code. Which one calls tools more reliably. Which one has the larger context window. Which one costs less per task.

That conversation still matters. But it is no longer enough.

The systems attracting the most serious infrastructure attention now are not just models. They are harnessed systems: models surrounded by tools, memory, context management, sandboxes, files, permissions, recovery logic, evaluators, and feedback loops.

A harness turns a model into a tool-using, long-running agent. Once that happens, the hardest question changes: where does the agent’s runtime state live, and who controls it?

That question is becoming one of the defining architecture choices in AI systems today.

Consider two situations that come up in production.

You ship a working agent on LangGraph. Three months later, Anthropic releases a model that beats your current one on tool use, and your team wants to switch frameworks to take advantage of it. The agent’s memory (conversation history, user preferences, task checkpoints) lives inside LangGraph’s internal state format. Switching means one of two things: rewrite the memory layer to match the new harness, or start every user back at turn zero.

Or your coding agent runs a four-hour task and gets interrupted at hour two. On restart, it has no record of what it already tried, which files it modified, or what it decided to skip. It starts over and repeats the same failed paths.

In both cases, the problem is not the model. It is where the agent’s state lived.

The Harness Is the Application Layer

An agent harness is not a thin wrapper around a model. It decides what the model can see, which tools it can call, what prior work enters context, how tool outputs are represented, where intermediate artifacts go, when memory is retrieved, and how the agent resumes after interruption.

OpenAI describes the Codex harness as the core agent loop and execution logic: the part that coordinates user input, model inference, tool calls, tool outputs, conversation history, and context window management. Anthropic makes a similar distinction in its work on agent evals, noting that when you evaluate an agent, you are evaluating the model and the harness together.

That distinction matters:

A model produces tokens.
A harness produces behavior.

The model is where reasoning happens. The harness is where reasoning becomes work: reading files, invoking tools, calling MCP servers, testing outputs, writing summaries, deciding what carries forward. Stronger models do not eliminate the harness. They raise the ceiling on what the harness can coordinate.

Long-Running Agents Create State Whether You Plan for It or Not

Every useful harness produces state.

Some of it is obvious: conversation history, retrieved context, tool outputs, task plans, execution logs, generated files, user preferences, memory records.
Some of it is less obvious: what the agent already tried, what it abandoned, what assumptions were true, what was compressed out of context, what artifacts belong to which decision, what should be visible to the next session.

In demos, that state often lives wherever it is easiest to put it: Markdown files, JSON blobs, SQLite, a vector store, a hidden harness directory, a provider-managed thread, a progress file in the repo. For early prototypes, that is often fine. The problem starts when temporary state becomes runtime state.

Anthropic’s work on long-running agents shows the failure mode clearly. Their coding agents needed progress files, feature lists, git history, initialization scripts, and structured handoff artifacts so that each new session could understand what happened before. Without that, each context window began with too little usable memory of the previous one. The issue was not model intelligence. The runtime did not preserve enough usable state.

That is the shift production teams are now facing.

Memory Is Not a Plugin

The market is converging on a sharper point: memory cannot be added to an agent as an afterthought.

LangChain’s analysis of harness architecture argues that memory is tightly coupled to how harnesses manage context and state. If memory lives inside a closed harness or proprietary API, developers lose control of the state that makes their agent useful.

Memory sits in the path between the system and the model. Before the model acts, the harness decides what to retrieve, summarize, filter, compress, and include. After the model acts, the harness decides what to store, update, discard, or expose to future runs.

But once you see memory this way, the word starts to feel too narrow. The harness is managing runtime state: session state, task history, user profiles, permissions, retrieval metadata, tool outputs, workspace files, generated artifacts, recovery checkpoints.

That is why “just add a vector database” is an incomplete answer. Vector search is useful for semantic recall. But agent state is not purely semantic. A production agent also needs:

Exact filters and chronology: what happened in what order
Ownership and transactions: which session wrote what
Permissions and auditability: who can read which records
Structured querying: inspection and reporting across history

The tier that most often lacks this is structured, queryable control-plane state: task history, permissions, user profiles, audit trails, recovery checkpoints. These are the records that make an agent’s operating history inspectable and portable across harnesses.

The Real Risk Is State Lock-In

It is tempting to frame the harness debate as open source versus closed source. That framing is too simple. The deeper issue is state lock-in.

For CTOs, architectural tech leads, and platform engineers, the practical questions are:

Can we inspect what the agent stored?
Can we query memory across users, sessions, tools, and workspaces?
Can we migrate state from one harness to another?
Can we separate model choice from memory ownership?
Can we keep using the state if we change model providers?

If the answer is no, the harness is not only orchestrating the agent. It is becoming the system of record for the agent’s operating history. That may be acceptable for early experiments. It is much harder to accept once agents participate in engineering, support, research, data operations, or customer-facing workflows.

Tool-Calling Agents Make State Governance Visible

Once agents call external tools at scale, the platform needs clear answers for identity, authorization, audit trails, policy enforcement, and backend persistence. Agent workloads are shaped by runtime inputs, not just static code, which means infrastructure-level controls matter.

A better design treats state as a first-class substrate beneath the harness: which agent called which tool, under which identity, with which input, visible to which future run. Without that record, debugging and compliance become guesswork.

What Memory Benchmarks Confirm

Benchmarks like LoCoMo confirm empirically what production experience already shows: memory quality depends heavily on how the agent manages context and uses tools, not only on the retrieval mechanism.

The implication is direct. Improving memory means improving the harness, and the harness needs a durable, queryable, portable place to keep the state it manages.

The Missing Layer is an Independent State Substrate

The answer is not to make every agent prototype heavy. It is to make the state explicit earlier.

A useful state substrate has four properties:

Fast to create
Queryable with standard SQL
Independent from any single harness
Accessible from standard MySQL-compatible drivers

The point is portability. The harness can evolve, the model can change, and the state the agent accumulated stays in a place that is inspectable and queryable.

This is what TiDB Cloud Zero is designed to be in the Agent State Stack: an instant, MySQL-compatible SQL substrate that any harness can connect to, and migrate away from without data loss if the architecture changes. For agents that need persistent memory across sessions, Mem9 sits on top of that substrate as a managed memory API for coding agents and custom tools. Where agents also produce files, artifacts, and documents, Drive9 extends the stack to cover those too.

No single layer solves the whole problem. But treating state as a separable substrate is the architectural decision worth making before the harness becomes the system of record.

The two failure modes at the beginning of the article illustrate exactly this. In the migration case: if memory lives in a standard, queryable substrate rather than inside the harness’s internal format, it does not need to be rebuilt when the harness changes. It stays legible and portable. In the interrupted-task case: a durable state layer means the agent’s progress is a record it can read on restart, not session history it has to reconstruct from scratch.

When Does a Dedicated State Substrate Matter?

Not every agent needs this on day one. A dedicated state substrate becomes important when the agent crosses one or more of these lines:

Must resume across sessions
Memory must follow users across machines
Multiple agents need shared context
MCP tools need backend persistence
Tool outputs need to be queried later
The agent must survive model or harness changes
Provider-managed memory becomes a lock-in concern

If none of those apply, local files and a simple store may be enough. Once one applies, the architecture debt starts compounding.

The Question to Ask Before Choosing a Harness

As the harness becomes the application layer, it needs an independent state substrate to match.

Before choosing the next agent framework or MCP architecture, ask:

What state will this agent create, and will we still control it three months from now?

If the state lives only inside a closed API, you may move quickly but lose portability.
If it lives only in local files, you may preserve visibility but lose durability.
If it lives in an independent, queryable substrate, the harness can evolve, the model can change, and the agent’s operating history remains under your control.

That is what an AI harness actually needs beyond a model. Not more glue. A state layer the builder can own. The architecture decision is worth making before the harness becomes the system of record. If you are ready to test what an independent state layer looks like in practice, TiDB Cloud Zero spins up an instant, MySQL-compatible SQL substrate you can use for memory, tool outputs, and retrieval prototypes, and connect any harness to today.

Try TiDB Cloud Zero

Experience modern data infrastructure firsthand.

Start for Free

Thought Leadership

Have questions? Let us know how we can help.

TiDB Cloud Dedicated

A fully-managed cloud DBaaS for predictable workloads

TiDB Cloud Starter

A fully-managed cloud DBaaS for auto-scaling workloads

Start for Free Learn More