Building Production AI Agents with the Claude Agent SDK and MCP: A TypeScript Deep Dive
The Claude Agent SDK transforms AI agents from proof-of-concept toys into production systems. This guide covers the architecture, MCP integration patterns, and real-world deployment strategies that separate working agents from robust ones.
Most AI agent projects collapse when developers treat them like chatbots with extra steps. The Claude Agent SDK exists because production agents require session management, tool orchestration, and permission boundaries that standard LLM APIs never address. Teams that skip these fundamentals deploy agents that leak context, retry indefinitely on errors, or execute dangerous operations without validation.
The distinction between a demo and a production agent comes down to three architectural decisions: how sessions persist state, how tools integrate without fragile string parsing, and how permission models prevent unintended actions. The Claude Agent SDK makes these decisions explicit through its session API, Model Context Protocol (MCP) server integration, and hook system. When developers ignore these layers, they rebuild worse versions of the same patterns.
What the Claude Agent SDK Actually Is (And Why It's Not Just for Coding)
The Claude Agent SDK is a TypeScript framework for building stateful, tool-enabled AI agents that run multi-turn conversations with external system access. Unlike wrapper libraries that abstract Claude's API, the SDK provides session primitives, tool registration patterns, and an event loop designed for agents that maintain context across hours or days.
The critical insight: agents need memory that survives process restarts. A customer service agent handling a refund request must remember the order ID, previous attempts, and approval status across multiple interactions. The SDK's session system persists this state automatically, exposing methods like continueSession() and getSessionHistory() that handle serialization without custom database logic.
MCP servers extend this architecture by turning any external service into a first-class agent tool. Instead of writing brittle functions that parse LLM output and hope the format matches, developers implement standardized tool schemas that the agent calls with validated parameters. The SDK handles the marshaling, error propagation, and retry logic that breaks in hand-rolled implementations.
Core Architecture: Sessions, Tools, and the Agent Loop
The SDK organizes around three core abstractions: the Agent class that orchestrates LLM calls, Session objects that maintain conversation state, and Tool definitions that expose capabilities. The agent loop runs a simple cycle: receive user input, query Claude with available tools and session context, execute any tool calls Claude requests, then loop until Claude returns a final response.
This structure eliminates the common failure mode where developers manually track conversation history, forget to include tool results in follow-up prompts, or lose state when the process crashes. The SDK's runAgent() method handles all of this, exposing hooks for logging and permission checks without requiring reimplementation of the core loop.
%% alt: Agent architecture showing session persistence and tool execution flow
flowchart TD
UserInput[User submits prompt]
LoadSession[Agent loads session state]
QueryClaude[Agent queries Claude with tools + history]
ToolCall{Claude requests tool?}
ExecuteTool[SDK executes tool via MCP]
UpdateSession[Session persists tool result]
FinalResponse[Claude returns final answer]
SaveState[Session state saved to storage]
UserInput --> LoadSession
LoadSession --> QueryClaude
QueryClaude --> ToolCall
ToolCall -->|Yes| ExecuteTool
ExecuteTool --> UpdateSession
UpdateSession --> QueryClaude
ToolCall -->|No| FinalResponse
FinalResponse --> SaveState
classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0
class UserInput userAction
class QueryClaude,ExecuteTool framework
class LoadSession,UpdateSession,SaveState dataStore
Sessions serialize to JSON by default, storing messages, tool results, and custom metadata in a format that survives restarts. Developers can swap the default in-memory storage for Redis, Postgres, or DynamoDB by implementing the SessionStorage interface. This matters when agents run across multiple servers or need audit trails for compliance.

Building Your First Production Agent: A Complete TypeScript Example
A production agent starts with explicit tool definitions and session configuration. The following example creates a research agent that queries APIs and maintains conversation context across multiple queries:
import { Agent, Session, MCPServer } from '@anthropic-ai/agent-sdk';
import { z } from 'zod';
// Define tool schema with validation
const searchTool = {
name: 'web_search',
description: 'Search the web for current information',
parameters: z.object({
query: z.string().describe('Search query'),
max_results: z.number().default(5),
}),
handler: async (params: { query: string; max_results: number }) => {
// Integration with actual search API
const results = await fetch(`https://api.search.com/v1/search`, {
method: 'POST',
body: JSON.stringify(params),
});
return results.json();
},
};
// Initialize agent with tools and config
const agent = new Agent({
apiKey: process.env.ANTHROPIC_API_KEY,
model: 'claude-3-5-sonnet-20241022',
tools: [searchTool],
systemPrompt: 'You are a research assistant. Use web search to find accurate, current information.',
maxTurns: 10, // Prevent infinite loops
});
// Create session with persistence
const session = await Session.create({
storage: new RedisSessionStorage(process.env.REDIS_URL),
metadata: { userId: 'user_123', context: 'market_research' },
});
// Run agent with input
const response = await agent.runAgent({
session,
userMessage: 'What are the latest developments in quantum computing?',
onToolUse: (tool, params) => {
console.log(`Executing: ${tool.name}`, params);
},
});
console.log(response.finalMessage);The maxTurns parameter prevents runaway execution when Claude repeatedly calls tools without converging on an answer. This happens more often than expected, particularly when tool results contain ambiguous data that prompts further exploration. The SDK exits the loop after the limit and returns the last message, allowing developers to handle incomplete results explicitly.
Connecting External Tools with MCP Servers
MCP servers transform existing APIs into agent-compatible tools without writing custom parsers. An MCP server exposes a JSON schema describing available operations, input parameters, and output formats. The SDK discovers these schemas at runtime and generates TypeScript-safe tool definitions automatically.
import { MCPClient } from '@anthropic-ai/agent-sdk/mcp';
// Connect to existing MCP server
const mcpClient = new MCPClient({
serverUrl: 'http://localhost:3001',
transport: 'stdio', // or 'http'
});
// Discover available tools
const tools = await mcpClient.listTools();
// Create agent with MCP-provided tools
const agent = new Agent({
apiKey: process.env.ANTHROPIC_API_KEY,
tools: tools.map(schema => ({
name: schema.name,
description: schema.description,
parameters: schema.inputSchema,
handler: async (params) => {
return mcpClient.callTool(schema.name, params);
},
})),
});This pattern eliminates the fragile middleware that most teams build when integrating LLMs with internal systems. Instead of parsing LLM output with regex and hoping the format matches, the SDK validates parameters against the schema before execution. When validation fails, the SDK returns structured error messages to Claude, which can then request corrections without developer intervention.
The stdio transport mode runs MCP servers as child processes, useful for development but risky in production. HTTP mode deploys MCP servers as separate services with proper monitoring and rate limiting. Teams often start with stdio for prototyping then migrate to HTTP when agents move to staging environments.
Advanced Patterns: Subagents, Hooks, and Permission Models
Production agents require coordination between specialized subagents, runtime permission checks, and observability hooks. The SDK supports hierarchical agent structures where a coordinator agent delegates tasks to domain-specific subagents, each with isolated tool access and session state.
The failure mode here is subtle but expensive: when a general-purpose agent has access to every tool, it makes poor decisions about which operations to execute. A customer service agent with both refund approval and password reset capabilities might attempt the wrong operation when context is ambiguous. Subagents enforce the principle of least privilege at the architecture level.
%% alt: Subagent delegation flow with permission boundaries
flowchart TD
UserQuery[User query arrives]
Coordinator[Coordinator agent analyzes intent]
RouteDecision{Which domain?}
RefundAgent[Refund subagent]
SupportAgent[Support subagent]
PermCheck[Permission hook validates action]
PermDenied[Return permission denied]
Execute[Execute tool via MCP]
Aggregate[Coordinator aggregates results]
Response[Return final response]
UserQuery --> Coordinator
Coordinator --> RouteDecision
RouteDecision -->|Refund request| RefundAgent
RouteDecision -->|Account issue| SupportAgent
RefundAgent --> PermCheck
SupportAgent --> PermCheck
PermCheck -->|Denied| PermDenied
PermCheck -->|Approved| Execute
Execute --> Aggregate
PermDenied --> Aggregate
Aggregate --> Response
classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0
class UserQuery userAction
class Coordinator,RefundAgent,SupportAgent,Execute framework
class PermCheck,Aggregate dataStore
The SDK's hook system intercepts tool calls before execution, allowing custom logic for permission validation, rate limiting, or audit logging. This matters when agents operate in regulated environments where every action requires compliance evidence:
const agent = new Agent({
tools: [refundTool, passwordResetTool],
hooks: {
beforeToolUse: async (tool, params, context) => {
// Check user permissions
const allowed = await checkPermission(
context.session.metadata.userId,
tool.name,
params
);
if (!allowed) {
throw new Error(`Permission denied for ${tool.name}`);
}
// Log for compliance audit
await auditLog.write({
userId: context.session.metadata.userId,
action: tool.name,
params,
timestamp: new Date(),
});
},
},
});
Claude Agent SDK vs Other Agent Frameworks (LangChain, AutoGPT, CrewAI)
Most agent frameworks prioritize flexibility over production readiness, leaving session management and error handling as exercises for developers. The tradeoffs become clear when comparing architecture decisions across popular frameworks.
%% alt: Comparison of agent framework architectures
flowchart LR
subgraph ClaudeSDK["Claude Agent SDK: sessions + MCP built-in"]
SessionAPI[Session API with storage]
MCPNative[Native MCP integration]
HookSystem[Hook system for permissions]
SessionAPI --> MCPNative
MCPNative --> HookSystem
end
subgraph LangChain["LangChain: chains + custom memory"]
ChainBuilder[Chain composition API]
CustomMem[Developer implements memory]
ToolWrapper[Tool wrappers for each LLM]
ChainBuilder --> CustomMem
CustomMem --> ToolWrapper
end
classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0
class SessionAPI,ChainBuilder framework
class CustomMem,HookSystem dataStore
LangChain provides maximum flexibility through its chain abstraction, allowing developers to compose arbitrary LLM operations into complex workflows. This flexibility costs production readiness: teams must implement their own session persistence, tool validation, and error recovery. Most LangChain agents in production have custom session managers and retry logic that recreate SDK features poorly.
AutoGPT optimizes for autonomous operation, running agents that plan multi-step tasks without human intervention. This works for research demos but fails in production where uncontrolled autonomy means unpredictable costs and dangerous actions. The SDK's turn limits and permission hooks prevent the runaway behavior that makes AutoGPT unsuitable for customer-facing systems.
CrewAI specializes in multi-agent coordination, providing role-based agent hierarchies similar to the subagent pattern. The framework excels at complex workflows with specialized agents but lacks the SDK's MCP integration and session API. Teams using CrewAI often implement custom tool protocols that duplicate MCP functionality.
The practical implication: the Claude Agent SDK trades flexibility for production-ready defaults. Developers building agents that interact with customers, handle sensitive data, or run at scale benefit from the SDK's opinionated architecture. Teams prototyping novel agent patterns or researching new capabilities prefer LangChain's flexibility.
Production Checklist: Error Handling, Rate Limits, and Session Management
Deploying agents to production requires systematic handling of the failure modes that demos ignore: API rate limits, tool execution timeouts, session storage failures, and malformed LLM output. The SDK provides patterns for each scenario, but developers must configure them explicitly.
%% alt: Production error handling flow
flowchart TD
AgentStart[Agent receives request]
RateCheck{Rate limit check}
RateDenied[Return 429 with retry-after]
SessionLoad{Load session}
SessionFail[Attempt recovery from backup]
QueryClaude[Query Claude with timeout]
Timeout{Request timeout?}
RetryLogic[Exponential backoff retry]
ToolExec[Execute tool with timeout]
ToolFail{Tool failed?}
Fallback[Use fallback response]
Success[Return result]
AgentStart --> RateCheck
RateCheck -->|Exceeded| RateDenied
RateCheck -->|OK| SessionLoad
SessionLoad -->|Failed| SessionFail
SessionLoad -->|Success| QueryClaude
SessionFail --> QueryClaude
QueryClaude --> Timeout
Timeout -->|Yes| RetryLogic
RetryLogic --> QueryClaude
Timeout -->|No| ToolExec
ToolExec --> ToolFail
ToolFail -->|Yes| Fallback
ToolFail -->|No| Success
Fallback --> Success
classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0
class AgentStart userAction
class QueryClaude,ToolExec framework
class SessionLoad,RateCheck,Fallback dataStore
Rate limiting at the application level prevents cost overruns when agents loop indefinitely or users spam requests. The SDK supports per-user and per-session rate limits through the session metadata:
const agent = new Agent({
hooks: {
beforeQuery: async (session, context) => {
const usage = await rateLimiter.check(session.metadata.userId);
if (usage.exceeded) {