Engineering · Architecture

The Agent Harness

A 12-part engineering guide to building autonomous AI agent infrastructure — Claude Code is our case study, your agent is the goal.

Series progress

12 / 12 parts

Start Reading

All Parts

What Is an Agent Harness? Why Your LLM Needs More Than an API Call (Part 1)

Apr 03, 2026 14 min

Most LLM apps fail not because the model is wrong, but because there's nothing holding it together. Here's the architecture behind the infrastructure that makes autonomous agents actually work.

The Dialog Loop: The Heartbeat of Every Autonomous Agent (Part 2)

Apr 05, 2026 15 min

Every autonomous agent runs on a loop. Here's what that loop actually needs to do — and what Claude Code's implementation reveals about building one that holds up in production.

The Tool System: How Agents Act on the World (Part 3)

Apr 07, 2026 16 min

Without tools, an LLM can only produce text. Here's the engineering behind the tool system that turns Claude Code from a chatbot into an agent that acts — safely, concurrently, and with guarantees.

The Permission Pipeline: Safety That Doesn't Get in the Way (Part 4)

Apr 09, 2026 15 min

Autonomous agents need to act without constant interruption — but they also need guardrails. Here's how to design a permission system that provides both: safety that scales with the risk level, not a blunt on/off switch.

Configuration as Architecture: The Multi-Layer Settings Problem (Part 5)

Apr 11, 2026 13 min

Every enterprise app has a settings file. Agent harnesses need an architecture. Here's how Claude Code manages configuration across six layers of stakeholders — users, projects, enterprises, and plugins — without collapsing into chaos.

The Memory System: How Agents Remember Across Sessions (Part 6)

Apr 13, 2026 14 min

Every session starts fresh — unless you build a memory system. Here's how to design one that stores what matters, skips what doesn't, and extracts memories without blocking your main loop.

Context Management: The Compression Problem (Part 7)

Apr 15, 2026 14 min

Every long-running agent eventually hits the context window ceiling. The question isn't whether — it's when, and how gracefully you handle it. Here's the four-level compression architecture that keeps agents running without crashing.

The Hook System: Extension Points That Don't Break the Core (Part 8)

Apr 17, 2026 16 min

Every operator has different requirements for how an agent should behave. The hook system is how you satisfy them without forking. Here's the architecture behind 26 lifecycle events, 5 hook types, and a security model that prevents operator customization from becoming an attack surface.

Sub-Agents, Coordinators, and Skills: Multi-Agent Orchestration (Part 9)

Apr 19, 2026 18 min

Single agents hit capability ceilings. Multi-agent systems hit coordination problems. Here's the architecture for both: the Fork pattern for parallel execution, the Coordinator pattern for enterprise orchestration, and skills + MCP for the capability extension layer.

Streaming Architecture: Building Agents That Feel Fast (Part 10)

Apr 21, 2026 15 min

An agent that takes 10 seconds to respond feels broken even if it's correct. Streaming isn't just a UX feature — it's an architectural choice that shapes every component. Here's how to build agents that feel fast.

Plan Mode: The Architecture of Thinking Before Acting (Part 11)

Apr 23, 2026 14 min

The most expensive agent mistakes happen in the first few turns, before the agent understands the full picture. Plan Mode is the architectural pattern that prevents premature action — and here's how it's built.

Build Your Own Agent Harness: The Practical Blueprint (Part 12)

Apr 25, 2026 16 min

Eleven posts of principles. One post of synthesis. The three questions every builder should answer before writing a line, the twelve design lessons Claude Code taught us, and the practical kit to start right.