jsmanifest logojsmanifest

Building Long-Running AI Agents: The Initializer Plus Coding Agent Harness Pattern

Building Long-Running AI Agents: The Initializer Plus Coding Agent Harness Pattern

Most agent failures stem from context window exhaustion. The initializer-plus-coding-agent harness solves this by splitting setup from execution—here's how to build production harnesses that outlast your model vendor.

The Context Window Problem That Killed Long-Running Agents

Most long-running agent problems stem from treating agents like infinitely patient humans. Engineers build agents that receive every instruction in a single conversation, accumulate tool outputs across hours, and hit context limits before finishing critical tasks. The failure mode here is subtle but expensive: agents lose task context mid-execution, repeat work already completed, or crash with cryptic "context length exceeded" errors.

The pattern that teams overlook is separation of concerns between initialization and execution. Production agents need a harness that splits environment setup from incremental work, preserves state across sessions, and hands off context without reloading the entire conversation history. This matters because the alternative—a monolithic agent fighting context windows—fails predictably at scale.

Understanding the Initializer Plus Coding Agent Harness Pattern

The initializer-plus-coding-agent harness divides agent work into two distinct phases with separate context budgets. The initializer agent runs once per task to establish environment, scaffold files, and define the work plan. The coding agent then executes that plan incrementally across multiple sessions, each starting fresh with only essential context.

This distinction is critical. The initializer consumes context on setup operations that never need repeating—installing dependencies, analyzing existing code structure, generating project templates. The coding agent focuses entirely on implementation, loading only the current file, relevant test results, and the next work item from the plan.

%% alt: Two-phase agent architecture showing initializer setup followed by coding agent execution loop
flowchart TD
    Start([Task Request])
    Init[Initializer Agent]
    Scaffold[Scaffold Environment]
    Plan[Generate Work Plan]
    State[(Session State)]
    Code[Coding Agent Session 1]
    Code2[Coding Agent Session 2]
    Done([Task Complete])
    
    Start --> Init
    Init --> Scaffold
    Scaffold --> Plan
    Plan --> State
    State --> Code
    Code --> State
    State --> Code2
    Code2 --> Done
    
    classDef framework fill:#0b3b2e,stroke:#34d399,color:#d1fae5
    classDef dataStore fill:#3a2f0b,stroke:#fbbf24,color:#fef3c7
    classDef userAction fill:#142544,stroke:#7c9cf0,color:#eaf2ff
    
    class Init,Code,Code2 framework
    class State dataStore
    class Start,Done userAction

The harness itself is the orchestration layer that manages both agents, handles session boundaries, and maintains the state store between executions. In other words, the harness knows when to invoke the initializer versus when to spawn a new coding session, but it does not contain agent logic itself.

AI agent harness architecture diagram

Building the Initializer Agent: Environment Setup and Scaffolding

The initializer agent receives the task description and produces three artifacts: a scaffolded environment, a work plan, and a context summary for the coding agent. Engineers often over-complicate this phase by including execution logic—the initializer should never write production code, only prepare the workspace.

interface InitializerOutput {
  environment: {
    workingDirectory: string;
    dependencies: string[];
    scaffoldedFiles: string[];
  };
  workPlan: {
    items: Array<{
      id: string;
      description: string;
      files: string[];
      dependencies: string[];
    }>;
    totalEstimate: string;
  };
  contextSummary: {
    projectStructure: string;
    technicalConstraints: string[];
    keyDecisions: string[];
  };
}
 
async function runInitializer(
  taskDescription: string,
  tools: Tool[]
): Promise<InitializerOutput> {
  const prompt = `
You are an initializer agent. Your job is SETUP ONLY—no implementation.
 
Task: ${taskDescription}
 
Your outputs:
1. Scaffold the project structure (mkdir, touch files, package.json)
2. Generate a work plan with specific file-level tasks
3. Summarize the project for the coding agent
 
Do not write implementation code. Stop after setup is complete.
  `.trim();
 
  const response = await runAgent({
    prompt,
    tools,
    maxTurns: 20,
    stopCondition: (history) =>
      history.some((msg) =>
        msg.content.includes("INITIALIZATION_COMPLETE")
      ),
  });
 
  return parseInitializerOutput(response);
}

The critical detail here is the stop condition. The initializer must signal completion explicitly—relying on the model to "know when it's done" leads to wasted turns on unnecessary analysis. Tools available to the initializer should be filesystem-only: mkdir, writeFile, readFile, listDirectory. Network access and code execution are forbidden.

Implementing the Coding Agent: Incremental Progress and Session Artifacts

The coding agent operates in bounded sessions, each addressing a single work item from the plan. Engineers fail here by passing the entire conversation history forward—this defeats the purpose of session separation. Instead, each coding session receives only the current work item, relevant file contents, and test results from the previous session.

interface CodingSessionInput {
  workItem: WorkItem;
  relevantFiles: Map<string, string>;
  previousResults?: {
    testsRun: string[];
    errors: string[];
  };
  contextSummary: string;
}
 
interface CodingSessionOutput {
  filesModified: string[];
  testsRun: string[];
  nextWorkItem?: string;
  blockers: string[];
}
 
async function runCodingSession(
  input: CodingSessionInput,
  tools: Tool[]
): Promise<CodingSessionOutput> {
  const prompt = `
You are a coding agent working on a single task. Context is limited—focus only on this work item.
 
Current task: ${input.workItem.description}
 
Relevant files loaded:
${Array.from(input.relevantFiles.entries())
  .map(([path, content]) => `${path}:\n${content}`)
  .join("\n\n")}
 
${
  input.previousResults
    ? `Previous session results:\n- Tests: ${input.previousResults.testsRun.join(", ")}\n- Errors: ${input.previousResults.errors.join(", ")}`
    : ""
}
 
Project context: ${input.contextSummary}
 
Complete this work item, run tests, then stop. Signal SESSION_COMPLETE when done.
  `.trim();
 
  const response = await runAgent({
    prompt,
    tools,
    maxTurns: 50,
    stopCondition: (history) =>
      history.some((msg) => msg.content.includes("SESSION_COMPLETE")),
  });
 
  return parseCodingOutput(response);
}

The key distinction: coding sessions are stateless except for the explicit inputs. The agent does not remember previous sessions—all continuity comes from the work plan and the session artifacts stored by the harness. This pattern prevents context accumulation and makes each session independently debuggable.

Coding agent session workflow

State Management Between Sessions: Files, Context, and Handoff Strategies

The harness manages state through three mechanisms: the filesystem (persistent work artifacts), a lightweight context store (project summary and work plan), and session artifacts (test results and error logs). Teams often reach for databases here—this is unnecessary complexity for agent state.

%% alt: State flow between coding agent sessions showing filesystem, context store, and artifact handoff
flowchart TD
    Session1[Coding Session 1]
    FS[(Filesystem)]
    Context[(Context Store)]
    Artifacts[(Session Artifacts)]
    Harness[Harness Orchestrator]
    Session2[Coding Session 2]
    
    Session1 -->|Write code| FS
    Session1 -->|Test results| Artifacts
    Session1 -->|Complete| Harness
    Harness -->|Read files| FS
    Harness -->|Read context| Context
    Harness -->|Read results| Artifacts
    Harness -->|Prepare inputs| Session2
    Session2 -->|Write code| FS
    Session2 -->|Test results| Artifacts
    
    classDef framework fill:#0b3b2e,stroke:#34d399,color:#d1fae5
    classDef dataStore fill:#3a2f0b,stroke:#fbbf24,color:#fef3c7
    
    class Session1,Session2,Harness framework
    class FS,Context,Artifacts dataStore

The filesystem stores all code, tests, and configuration files—this is the source of truth. The context store holds the project summary from initialization (typically 500-1000 tokens) and the work plan. Session artifacts capture test outputs, error messages, and the agent's self-reported completion status. The harness reads these three sources to construct the next session's input.

The handoff strategy determines which files the next session loads. Static analysis of import statements works for most cases—load the file being modified plus its direct imports. For test-driven workflows, load the test file, the implementation file, and any failing test output. The failure mode here is loading too many files—stay under 10k tokens of file content per session.

Related patterns for managing multi-agent state: Claude Code Subagents: Building Agent Teams for Parallel Coding

Harness vs SDK vs Framework: When to Use Each Approach

The distinction between harness, SDK, and framework matters for maintainability and vendor lock-in. A harness is orchestration code you control that calls model APIs directly. An SDK is a vendor-provided library that handles API calls but exposes configuration. A framework makes architectural decisions for you—routing, state management, tool execution—and abstracts the model entirely.

%% alt: Comparison of harness, SDK, and framework architectures showing control vs abstraction tradeoffs
flowchart LR
    subgraph Harness["Harness: Full Control"]
        H1[Your Orchestration Code]
        H2[Direct API Calls]
        H3[Custom State Management]
    end
    
    subgraph SDK["SDK: Vendor API with Config"]
        S1[Vendor Library]
        S2[Configurable Prompts]
        S3[Standard Tool Interface]
    end
    
    subgraph Framework["Framework: Opinionated Stack"]
        F1[Framework Router]
        F2[Built-in State Store]
        F3[Abstracted Model Layer]
    end
    
    H1 --> H2
    H2 --> H3
    S1 --> S2
    S2 --> S3
    F1 --> F2
    F2 --> F3
    
    classDef userAction fill:#142544,stroke:#7c9cf0,color:#eaf2ff
    classDef framework fill:#0b3b2e,stroke:#34d399,color:#d1fae5
    classDef dataStore fill:#3a2f0b,stroke:#fbbf24,color:#fef3c7
    
    class H1,H2,H3 userAction
    class S1,S2,S3 framework
    class F1,F2,F3 dataStore

Choose a harness when you need full control over session boundaries, stop conditions, and state persistence. The initializer-plus-coding-agent pattern is a harness pattern—engineers write explicit orchestration logic. Choose an SDK when you want vendor-optimized features (streaming, caching, multi-agent coordination) but need to customize prompts and tools. Choose a framework when standardization across teams matters more than flexibility.

The implication here is portability. Harnesses require more code but switch between model providers in hours. SDKs lock you to a vendor's API surface but reduce boilerplate. Frameworks offer the fastest initial development but make model migration a multi-week refactor. For production systems expected to outlast any single AI vendor, harnesses win.

Context on modern SDK capabilities: MCP SDK v2: Streamable HTTP and Session Resumption

Production Considerations: Error Recovery, Tool Allowlists, and Stop Conditions

Production harnesses fail without explicit error recovery and safety constraints. The most common failures: agents that retry infinitely on transient errors, agents that invoke dangerous tools (file deletion, network calls), and agents that run past completion because stop conditions are too vague.

%% alt: Production harness safety architecture showing error recovery, tool allowlist, and stop condition enforcement
flowchart TD
    Request[Agent Request]
    Allowlist{Tool in Allowlist?}
    Execute[Execute Tool]
    Error{Error Occurred?}
    Retry{Retry Count < 3?}
    Recovery[Error Recovery Handler]
    Stop{Stop Condition Met?}
    Complete[Session Complete]
    Fail[Session Failed]
    
    Request --> Allowlist
    Allowlist -->|Yes| Execute
    Allowlist -->|No| Fail
    Execute --> Error
    Error -->|No| Stop
    Error -->|Yes| Retry
    Retry -->|Yes| Recovery
    Retry -->|No| Fail
    Recovery --> Execute
    Stop -->|Yes| Complete
    Stop -->|No| Request
    
    style Fail stroke:#ef4444,fill:#450a0a,color:#fca5a5
    
    classDef framework fill:#0b3b2e,stroke:#34d399,color:#d1fae5
    classDef userAction fill:#142544,stroke:#7c9cf0,color:#eaf2ff
    
    class Request,Execute,Recovery framework
    class Complete,Fail userAction

Tool allowlists define exactly which operations each agent type can perform. The initializer gets filesystem setup tools only—no code execution, no network access. The coding agent gets file read/write, test execution, and language server queries. The harness enforces these allowlists before forwarding tool calls to the model.

Error recovery requires distinguishing transient from permanent failures. Transient errors—network timeouts, rate limits, temporary file locks—retry with exponential backoff (3 attempts maximum). Permanent errors—syntax errors, missing dependencies, invalid tool arguments—surface immediately to the harness with context for the next session. Agents that retry indefinitely on syntax errors waste thousands of tokens on impossible tasks.

Stop conditions must be explicit and token-efficient. Bad: "stop when the task feels complete." Good: "stop when you output SESSION_COMPLETE or reach 50 turns." Better: "stop when all tests pass or you output BLOCKED with a reason." The harness enforces these conditions—models cannot reliably self-terminate.

Background on large context windows and why session boundaries still matter: 2 Million Token Context Windows in Real Web Apps

Building Harnesses That Outlast Your Model Vendor

That covers the essential patterns for production agent harnesses. The initializer-plus-coding-agent approach solves context exhaustion by separating setup from execution and bounding each session to a single work item. State flows through the filesystem, a lightweight context store, and session artifacts—not conversation history. Tool allowlists, explicit stop conditions, and error recovery with retry budgets prevent the most common production failures.

Engineers who build custom harnesses instead of relying on frameworks gain portability and control. When Claude releases a better API or OpenAI ships a faster model, harness-based systems migrate in hours—not weeks. The abstraction boundary is the orchestration layer, not the model provider.

Apply these patterns in production and the difference will be immediate. Agents that previously stalled at 100k tokens now complete multi-file refactors across millions of tokens. Tasks that required manual intervention every 30 minutes now run supervised overnight. The harness is not glamorous infrastructure, but it is what separates prototype agents from production systems.