Building Production AI Agents with the Claude Agent SDK and MCP: A TypeScript Deep Dive

The Claude Agent SDK transforms AI agents from proof-of-concept toys into production systems. This guide covers the architecture, MCP integration patterns, and real-world deployment strategies that separate working agents from robust ones.

Most AI agent projects collapse when developers treat them like chatbots with extra steps. The Claude Agent SDK exists because production agents require session management, tool orchestration, and permission boundaries that standard LLM APIs never address. Teams that skip these fundamentals deploy agents that leak context, retry indefinitely on errors, or execute dangerous operations without validation.

The distinction between a demo and a production agent comes down to three architectural decisions: how sessions persist state, how tools integrate without fragile string parsing, and how permission models prevent unintended actions. The Claude Agent SDK makes these decisions explicit through its session API, Model Context Protocol (MCP) server integration, and hook system. When developers ignore these layers, they rebuild worse versions of the same patterns.

What the Claude Agent SDK Actually Is (And Why It's Not Just for Coding)

The Claude Agent SDK is a TypeScript framework for building stateful, tool-enabled AI agents that run multi-turn conversations with external system access. Unlike wrapper libraries that abstract Claude's API, the SDK provides session primitives, tool registration patterns, and an event loop designed for agents that maintain context across hours or days.

The critical insight: agents need memory that survives process restarts. A customer service agent handling a refund request must remember the order ID, previous attempts, and approval status across multiple interactions. The SDK's session system persists this state automatically, exposing methods like continueSession() and getSessionHistory() that handle serialization without custom database logic.

MCP servers extend this architecture by turning any external service into a first-class agent tool. Instead of writing brittle functions that parse LLM output and hope the format matches, developers implement standardized tool schemas that the agent calls with validated parameters. The SDK handles the marshaling, error propagation, and retry logic that breaks in hand-rolled implementations.

Core Architecture: Sessions, Tools, and the Agent Loop

The SDK organizes around three core abstractions: the Agent class that orchestrates LLM calls, Session objects that maintain conversation state, and Tool definitions that expose capabilities. The agent loop runs a simple cycle: receive user input, query Claude with available tools and session context, execute any tool calls Claude requests, then loop until Claude returns a final response.

This structure eliminates the common failure mode where developers manually track conversation history, forget to include tool results in follow-up prompts, or lose state when the process crashes. The SDK's runAgent() method handles all of this, exposing hooks for logging and permission checks without requiring reimplementation of the core loop.

%% alt: Agent architecture showing session persistence and tool execution flow
flowchart TD
    UserInput[User submits prompt]
    LoadSession[Agent loads session state]
    QueryClaude[Agent queries Claude with tools + history]
    ToolCall{Claude requests tool?}
    ExecuteTool[SDK executes tool via MCP]
    UpdateSession[Session persists tool result]
    FinalResponse[Claude returns final answer]
    SaveState[Session state saved to storage]

    UserInput --> LoadSession
    LoadSession --> QueryClaude
    QueryClaude --> ToolCall
    ToolCall -->|Yes| ExecuteTool
    ExecuteTool --> UpdateSession
    UpdateSession --> QueryClaude
    ToolCall -->|No| FinalResponse
    FinalResponse --> SaveState

    classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
    classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
    classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0

    class UserInput userAction
    class QueryClaude,ExecuteTool framework
    class LoadSession,UpdateSession,SaveState dataStore

Sessions serialize to JSON by default, storing messages, tool results, and custom metadata in a format that survives restarts. Developers can swap the default in-memory storage for Redis, Postgres, or DynamoDB by implementing the SessionStorage interface. This matters when agents run across multiple servers or need audit trails for compliance.

Production agent architecture diagram

Building Your First Production Agent: A Complete TypeScript Example

A production agent starts with explicit tool definitions and session configuration. The following example creates a research agent that queries APIs and maintains conversation context across multiple queries:

import { Agent, Session, MCPServer } from '@anthropic-ai/agent-sdk';
import { z } from 'zod';
 
// Define tool schema with validation
const searchTool = {
  name: 'web_search',
  description: 'Search the web for current information',
  parameters: z.object({
    query: z.string().describe('Search query'),
    max_results: z.number().default(5),
  }),
  handler: async (params: { query: string; max_results: number }) => {
    // Integration with actual search API
    const results = await fetch(`https://api.search.com/v1/search`, {
      method: 'POST',
      body: JSON.stringify(params),
    });
    return results.json();
  },
};
 
// Initialize agent with tools and config
const agent = new Agent({
  apiKey: process.env.ANTHROPIC_API_KEY,
  model: 'claude-3-5-sonnet-20241022',
  tools: [searchTool],
  systemPrompt: 'You are a research assistant. Use web search to find accurate, current information.',
  maxTurns: 10, // Prevent infinite loops
});
 
// Create session with persistence
const session = await Session.create({
  storage: new RedisSessionStorage(process.env.REDIS_URL),
  metadata: { userId: 'user_123', context: 'market_research' },
});
 
// Run agent with input
const response = await agent.runAgent({
  session,
  userMessage: 'What are the latest developments in quantum computing?',
  onToolUse: (tool, params) => {
    console.log(`Executing: ${tool.name}`, params);
  },
});
 
console.log(response.finalMessage);

The maxTurns parameter prevents runaway execution when Claude repeatedly calls tools without converging on an answer. This happens more often than expected, particularly when tool results contain ambiguous data that prompts further exploration. The SDK exits the loop after the limit and returns the last message, allowing developers to handle incomplete results explicitly.

Connecting External Tools with MCP Servers

MCP servers transform existing APIs into agent-compatible tools without writing custom parsers. An MCP server exposes a JSON schema describing available operations, input parameters, and output formats. The SDK discovers these schemas at runtime and generates TypeScript-safe tool definitions automatically.

import { MCPClient } from '@anthropic-ai/agent-sdk/mcp';
 
// Connect to existing MCP server
const mcpClient = new MCPClient({
  serverUrl: 'http://localhost:3001',
  transport: 'stdio', // or 'http'
});
 
// Discover available tools
const tools = await mcpClient.listTools();
 
// Create agent with MCP-provided tools
const agent = new Agent({
  apiKey: process.env.ANTHROPIC_API_KEY,
  tools: tools.map(schema => ({
    name: schema.name,
    description: schema.description,
    parameters: schema.inputSchema,
    handler: async (params) => {
      return mcpClient.callTool(schema.name, params);
    },
  })),
});

This pattern eliminates the fragile middleware that most teams build when integrating LLMs with internal systems. Instead of parsing LLM output with regex and hoping the format matches, the SDK validates parameters against the schema before execution. When validation fails, the SDK returns structured error messages to Claude, which can then request corrections without developer intervention.

The stdio transport mode runs MCP servers as child processes, useful for development but risky in production. HTTP mode deploys MCP servers as separate services with proper monitoring and rate limiting. Teams often start with stdio for prototyping then migrate to HTTP when agents move to staging environments.

Advanced Patterns: Subagents, Hooks, and Permission Models

Production agents require coordination between specialized subagents, runtime permission checks, and observability hooks. The SDK supports hierarchical agent structures where a coordinator agent delegates tasks to domain-specific subagents, each with isolated tool access and session state.

The failure mode here is subtle but expensive: when a general-purpose agent has access to every tool, it makes poor decisions about which operations to execute. A customer service agent with both refund approval and password reset capabilities might attempt the wrong operation when context is ambiguous. Subagents enforce the principle of least privilege at the architecture level.

%% alt: Subagent delegation flow with permission boundaries
flowchart TD
    UserQuery[User query arrives]
    Coordinator[Coordinator agent analyzes intent]
    RouteDecision{Which domain?}
    RefundAgent[Refund subagent]
    SupportAgent[Support subagent]
    PermCheck[Permission hook validates action]
    PermDenied[Return permission denied]
    Execute[Execute tool via MCP]
    Aggregate[Coordinator aggregates results]
    Response[Return final response]

    UserQuery --> Coordinator
    Coordinator --> RouteDecision
    RouteDecision -->|Refund request| RefundAgent
    RouteDecision -->|Account issue| SupportAgent
    RefundAgent --> PermCheck
    SupportAgent --> PermCheck
    PermCheck -->|Denied| PermDenied
    PermCheck -->|Approved| Execute
    Execute --> Aggregate
    PermDenied --> Aggregate
    Aggregate --> Response

    classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
    classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
    classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0

    class UserQuery userAction
    class Coordinator,RefundAgent,SupportAgent,Execute framework
    class PermCheck,Aggregate dataStore

The SDK's hook system intercepts tool calls before execution, allowing custom logic for permission validation, rate limiting, or audit logging. This matters when agents operate in regulated environments where every action requires compliance evidence:

const agent = new Agent({
  tools: [refundTool, passwordResetTool],
  hooks: {
    beforeToolUse: async (tool, params, context) => {
      // Check user permissions
      const allowed = await checkPermission(
        context.session.metadata.userId,
        tool.name,
        params
      );
      
      if (!allowed) {
        throw new Error(`Permission denied for ${tool.name}`);
      }
      
      // Log for compliance audit
      await auditLog.write({
        userId: context.session.metadata.userId,
        action: tool.name,
        params,
        timestamp: new Date(),
      });
    },
  },
});

Multi-agent coordination pattern

Claude Agent SDK vs Other Agent Frameworks (LangChain, AutoGPT, CrewAI)

Most agent frameworks prioritize flexibility over production readiness, leaving session management and error handling as exercises for developers. The tradeoffs become clear when comparing architecture decisions across popular frameworks.

%% alt: Comparison of agent framework architectures
flowchart LR
    subgraph ClaudeSDK["Claude Agent SDK: sessions + MCP built-in"]
        SessionAPI[Session API with storage]
        MCPNative[Native MCP integration]
        HookSystem[Hook system for permissions]
        SessionAPI --> MCPNative
        MCPNative --> HookSystem
    end

    subgraph LangChain["LangChain: chains + custom memory"]
        ChainBuilder[Chain composition API]
        CustomMem[Developer implements memory]
        ToolWrapper[Tool wrappers for each LLM]
        ChainBuilder --> CustomMem
        CustomMem --> ToolWrapper
    end

    classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
    classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0

    class SessionAPI,ChainBuilder framework
    class CustomMem,HookSystem dataStore

LangChain provides maximum flexibility through its chain abstraction, allowing developers to compose arbitrary LLM operations into complex workflows. This flexibility costs production readiness: teams must implement their own session persistence, tool validation, and error recovery. Most LangChain agents in production have custom session managers and retry logic that recreate SDK features poorly.

AutoGPT optimizes for autonomous operation, running agents that plan multi-step tasks without human intervention. This works for research demos but fails in production where uncontrolled autonomy means unpredictable costs and dangerous actions. The SDK's turn limits and permission hooks prevent the runaway behavior that makes AutoGPT unsuitable for customer-facing systems.

CrewAI specializes in multi-agent coordination, providing role-based agent hierarchies similar to the subagent pattern. The framework excels at complex workflows with specialized agents but lacks the SDK's MCP integration and session API. Teams using CrewAI often implement custom tool protocols that duplicate MCP functionality.

The practical implication: the Claude Agent SDK trades flexibility for production-ready defaults. Developers building agents that interact with customers, handle sensitive data, or run at scale benefit from the SDK's opinionated architecture. Teams prototyping novel agent patterns or researching new capabilities prefer LangChain's flexibility.

Production Checklist: Error Handling, Rate Limits, and Session Management

Deploying agents to production requires systematic handling of the failure modes that demos ignore: API rate limits, tool execution timeouts, session storage failures, and malformed LLM output. The SDK provides patterns for each scenario, but developers must configure them explicitly.

%% alt: Production error handling flow
flowchart TD
    AgentStart[Agent receives request]
    RateCheck{Rate limit check}
    RateDenied[Return 429 with retry-after]
    SessionLoad{Load session}
    SessionFail[Attempt recovery from backup]
    QueryClaude[Query Claude with timeout]
    Timeout{Request timeout?}
    RetryLogic[Exponential backoff retry]
    ToolExec[Execute tool with timeout]
    ToolFail{Tool failed?}
    Fallback[Use fallback response]
    Success[Return result]

    AgentStart --> RateCheck
    RateCheck -->|Exceeded| RateDenied
    RateCheck -->|OK| SessionLoad
    SessionLoad -->|Failed| SessionFail
    SessionLoad -->|Success| QueryClaude
    SessionFail --> QueryClaude
    QueryClaude --> Timeout
    Timeout -->|Yes| RetryLogic
    RetryLogic --> QueryClaude
    Timeout -->|No| ToolExec
    ToolExec --> ToolFail
    ToolFail -->|Yes| Fallback
    ToolFail -->|No| Success
    Fallback --> Success

    classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
    classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
    classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0

    class AgentStart userAction
    class QueryClaude,ToolExec framework
    class SessionLoad,RateCheck,Fallback dataStore

Rate limiting at the application level prevents cost overruns when agents loop indefinitely or users spam requests. The SDK supports per-user and per-session rate limits through the session metadata:

const agent = new Agent({
  hooks: {
    beforeQuery: async (session, context) => {
      const usage = await rateLimiter.check(session.metadata.userId);
      if (usage.exceeded) {