jsmanifest logojsmanifest

MCP SDK v2: Streamable HTTP, Session Resumption, and What It Means for Your Agent Architecture

MCP SDK v2: Streamable HTTP, Session Resumption, and What It Means for Your Agent Architecture

Most MCP scaling failures stem from stateful stdio transport. The SDK v2 introduces Streamable HTTP and session resumption patterns that fundamentally change how production AI agents handle horizontal scaling.

Most MCP scaling failures stem from the stdio transport's inherent statefulness. Teams spin up multiple MCP servers behind a load balancer, watch sessions break mid-conversation, then band-aid the problem with sticky sessions or stateful proxies. MCP SDK v2's Streamable HTTP transport and session resumption patterns eliminate this entire failure mode.

Why MCP SDK v2 Changes Everything for Production AI Agents

The Model Context Protocol launched with stdio as its primary transport. This made sense for local development and single-tenant deployments. The failure mode emerged when teams scaled horizontally: stdio maintains connection state in the process, so any request landing on a different server instance breaks the session context.

The SDK v2 specification introduces two critical primitives: Streamable HTTP transport and session resumption tokens. These are not incremental improvements—they fundamentally change how agents maintain context across distributed infrastructure.

Streamable HTTP eliminates the process-bound connection requirement. Session resumption tokens allow any server instance to reconstruct context from a serialized state snapshot. The combination means agents can scale horizontally without sticky sessions or stateful routing.

This distinction is critical. The stdio pattern forces architectural compromises that ripple through your entire stack. WebSocket-based transports improve on this but still tie state to a live connection. Streamable HTTP with resumption tokens decouples context persistence from network topology entirely.

The Stateful Session Problem: Why Horizontal Scaling Was Broken

The stdio transport creates a stateful pipe between client and server. When an agent initiates a conversation, the server process holds the entire context in memory. Tool calls, resource access, and prompt history all accumulate in this process-local state.

The failure occurs when load balancers distribute subsequent requests. Request one hits server A and establishes context. Request two hits server B, which has no knowledge of the prior interaction. The agent loses thread continuity, tools fail to find their state, and the conversation breaks.

%% alt: stdio transport state fragmentation across load-balanced servers
flowchart TD
    Client[Agent Client]
    LB[Load Balancer]
    ServerA[MCP Server A<br/>Session ABC in memory]
    ServerB[MCP Server B<br/>No session state]
    
    Client -->|Request 1: Initialize| LB
    LB --> ServerA
    ServerA -->|Context stored locally| ServerA
    
    Client -->|Request 2: Continue| LB
    LB --> ServerB
    ServerB -->|Context not found| Error[Session Lost]
    
    style Error stroke:#ef4444,fill:#450a0a,color:#fca5a5
    
    classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
    classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
    classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0
    
    class Client,LB userAction
    class ServerA,ServerB framework

Teams respond with sticky sessions, which pin a client to a specific server instance. This works until that instance fails or requires rolling updates. Then sessions drop anyway, but now with the added operational complexity of maintaining routing affinity.

Stateful proxies offer another band-aid. The proxy intercepts requests, tracks session IDs, and routes to the server holding that state. This centralizes the fragility instead of distributing it. The proxy becomes a single point of failure and a scaling bottleneck.

The root issue is coupling session state to process lifecycle. No amount of routing cleverness fixes this architectural dependency.

MCP SDK v2 architecture diagram showing distributed session handling

Streamable HTTP Transport: How It Works Under the Hood

Streamable HTTP replaces the persistent stdio connection with stateless HTTP requests that carry session context. Each request includes a resumption token containing serialized state. The server reconstructs context from this token, processes the request, and returns an updated token.

The protocol uses Server-Sent Events for streaming responses. The client opens an HTTP connection, the server streams partial results as they generate, and the connection closes when complete. The next request opens a fresh connection with the updated resumption token.

This pattern decouples network transport from session lifecycle. The connection can fail, the client can switch servers, or the infrastructure can scale independently—none of this affects context continuity. The resumption token carries all necessary state.

%% alt: streamable HTTP request flow with session resumption tokens
flowchart TD
    Client[Agent Client]
    Token1[Resumption Token v1]
    Server[Any MCP Server Instance]
    Token2[Resumption Token v2]
    
    Client -->|HTTP Request + Token v1| Server
    Server -->|Deserialize context| Context[Session Context]
    Context -->|Process tool call| Result[Tool Result]
    Result -->|Serialize updated state| Token2
    Server -->|HTTP Response + Token v2| Client
    Client -->|Store for next request| Token2
    
    classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
    classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
    classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0
    
    class Client userAction
    class Server framework
    class Token1,Token2,Context dataStore

The implementation challenge is state serialization. Resumption tokens must capture tool state, resource handles, and conversation history without bloating request payloads. The SDK v2 specification recommends content-addressed storage for large state blobs, with tokens containing only pointers.

In other words, the token holds cryptographic hashes referencing state stored in Redis, S3, or a database. The server fetches state on demand using these hashes. This keeps tokens small while supporting arbitrarily large contexts.

Building a Stateless MCP Server with Session Resumption in TypeScript

The SDK v2 API introduces createStreamableHTTPServer and SessionResumptionStore interfaces. Developers implement state persistence logic while the SDK handles HTTP transport and token management.

import { createStreamableHTTPServer, SessionResumptionStore } from '@modelcontextprotocol/sdk/server';
import { Redis } from 'ioredis';
 
// Session store backed by Redis
class RedisSessionStore implements SessionResumptionStore {
  private redis: Redis;
  
  constructor(redisUrl: string) {
    this.redis = new Redis(redisUrl);
  }
  
  async save(sessionId: string, state: SessionState): Promise<string> {
    const stateJson = JSON.stringify(state);
    const hash = await this.redis.set(
      `session:${sessionId}`,
      stateJson,
      'EX',
      3600 // 1 hour TTL
    );
    return `v2:${sessionId}:${hash}`;
  }
  
  async load(token: string): Promise<SessionState | null> {
    const [version, sessionId, hash] = token.split(':');
    if (version !== 'v2') throw new Error('Invalid token version');
    
    const stateJson = await this.redis.get(`session:${sessionId}`);
    if (!stateJson) return null;
    
    return JSON.parse(stateJson);
  }
}
 
// MCP server with stateless HTTP transport
const sessionStore = new RedisSessionStore(process.env.REDIS_URL);
 
const server = createStreamableHTTPServer({
  name: 'stateless-mcp-server',
  version: '2.0.0',
  sessionStore,
  tools: [{
    name: 'database_query',
    description: 'Execute SQL against application database',
    inputSchema: {
      type: 'object',
      properties: {
        query: { type: 'string' }
      }
    },
    handler: async (params, context) => {
      // Context reconstructed from resumption token
      const { query } = params;
      const result = await context.db.execute(query);
      return { result };
    }
  }]
});
 
server.listen(3000);

This implementation stores session state in Redis with a one-hour TTL. The token encodes the session ID and a hash for validation. When a request arrives, the server loads state from Redis, processes the tool call, and saves updated state back.

The critical detail is context reconstruction. The SDK deserializes tool state, database connections, and conversation history from the resumption token. Handlers access this context transparently—they remain unaware of the underlying persistence mechanism.

Session resumption token flow diagram

Session Resumption Patterns: Sticky Sessions vs Stateful Proxy vs True Stateless

Three architectural patterns emerge for handling MCP sessions at scale. Each trades off implementation complexity, operational burden, and failure resilience differently.

%% alt: comparison of three session management architectures
flowchart LR
    subgraph StickySession["Sticky Sessions: client pinned to instance"]
        ClientA[Agent Client]
        LBA[Load Balancer<br/>with affinity]
        ServerA1[MCP Server Instance]
        ClientA --> LBA
        LBA -->|Always routes here| ServerA1
        ServerA1 -.->|Instance failure breaks session| FailA[Session Lost]
        style FailA stroke:#ef4444,fill:#450a0a,color:#fca5a5
    end
    
    subgraph StatefulProxy["Stateful Proxy: centralized routing"]
        ClientB[Agent Client]
        ProxyB[Stateful Proxy<br/>tracks sessions]
        ServerB1[MCP Server A]
        ServerB2[MCP Server B]
        ClientB --> ProxyB
        ProxyB -->|Routes by session ID| ServerB1
        ProxyB -->|Routes by session ID| ServerB2
        ProxyB -.->|Proxy becomes bottleneck| BottleB[SPOF]
        style BottleB stroke:#ef4444,fill:#450a0a,color:#fca5a5
    end
    
    subgraph TrueStateless["True Stateless: session in token"]
        ClientC[Agent Client]
        LBC[Load Balancer<br/>round-robin]
        ServerC1[MCP Server A]
        ServerC2[MCP Server B]
        StoreC[Shared State Store<br/>Redis/S3]
        ClientC --> LBC
        LBC --> ServerC1
        LBC --> ServerC2
        ServerC1 <--> StoreC
        ServerC2 <--> StoreC
    end
    
    classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
    classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
    classDef dataStore fill:#1e293b,stroke:#64ffda,color:#e2e8f0
    
    class ClientA,ClientB,ClientC userAction
    class LBA,ProxyB,LBC,ServerA1,ServerB1,ServerB2,ServerC1,ServerC2 framework
    class StoreC dataStore

Sticky sessions route clients to the same server instance using cookies or IP hashing. Implementation is simple but fragile. When an instance fails or restarts, all pinned sessions break. Rolling deployments require careful draining to avoid disruption.

Stateful proxies track session-to-server mappings in memory or a shared store. The proxy inspects session IDs and routes accordingly. This centralizes state management but introduces a single point of failure. The proxy must scale independently and handle failover without losing routing tables.

True stateless architecture with resumption tokens eliminates both failure modes. Clients can hit any server instance because state travels with the request. Instances can fail freely—the next request simply hits a different server. Rolling deployments become trivial.

The tradeoff is state store dependency. Redis, DynamoDB, or S3 must be available for token serialization and deserialization. This adds operational complexity but removes architectural coupling between session lifecycle and server topology.

Migrating from stdio to Streamable HTTP: A Production Refactoring Guide

Migrating existing stdio-based MCP servers requires three changes: transport replacement, state extraction, and client updates. The SDK provides compatibility shims to ease the transition, but the architecture must fundamentally shift.

%% alt: stdio to streamable HTTP migration execution flow
flowchart TD
    Start[Existing stdio MCP Server]
    Audit[Audit stateful dependencies]
    Extract[Extract state into serializable structure]
    Implement[Implement SessionResumptionStore]
    Replace[Replace stdio transport with createStreamableHTTPServer]
    Test[Test session continuity across instances]
    Deploy[Deploy with shared state backend]
    Monitor[Monitor resumption token sizes]
    
    Start --> Audit
    Audit --> Extract
    Extract --> Implement
    Implement --> Replace
    Replace --> Test
    Test --> Deploy
    Deploy --> Monitor
    
    Audit -.->|Found non-serializable state| Block[Refactor required]
    style Block stroke:#ef4444,fill:#450a0a,color:#fca5a5
    
    classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
    classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
    
    class Start,Audit,Extract userAction
    class Implement,Replace,Test,Deploy,Monitor framework

The first step is auditing tool handlers for stateful dependencies. Database connections, file handles, and external API clients often leak into global scope. These must move into serializable session context or be reconstructed per-request.

// Before: global state in stdio server
let dbConnection: DatabaseConnection;
 
server.setRequestHandler('initialize', async () => {
  dbConnection = await connectToDatabase();
});
 
// After: state in resumption token
interface SessionState {
  dbConfig: DatabaseConfig;
}
 
async function toolHandler(params, context: SessionState) {
  const db = await connectToDatabase(context.dbConfig);
  // Use db for this request only
  await db.close();
}

The second step is implementing SessionResumptionStore with your chosen backend. Redis works well for most workloads. S3 suits workloads with large state blobs and infrequent access. DynamoDB balances latency and cost for high-throughput scenarios.

The third step is client updates. Clients must store resumption tokens and include them in subsequent requests. The SDK v2 client handles this automatically, but custom clients require manual token management.

Test session continuity by forcing requests across multiple server instances. Simulate instance failures mid-conversation and verify the next request resumes cleanly. Monitor token sizes to catch state bloat—tokens over 4KB indicate insufficient state pruning or missing content addressing.

Architecture Decisions: When to Use Streamable HTTP vs stdio Transport

The transport choice depends on deployment model and scaling requirements. stdio remains appropriate for single-tenant, single-instance deployments. Streamable HTTP becomes necessary when horizontal scaling or zero-downtime deployments matter.

%% alt: decision tree for choosing MCP transport
flowchart TD
    Start[Choose MCP Transport]
    Scale{Need horizontal<br/>scaling?}
    Downtime{Require zero-downtime<br/>deployments?}
    Complexity{Can manage<br/>state backend?}
    
    Start --> Scale
    Scale -->|No| Downtime
    Scale -->|Yes| HTTP[Use Streamable HTTP]
    Downtime -->|No| Stdio[Use stdio transport]
    Downtime -->|Yes| Complexity
    Complexity -->|Yes| HTTP
    Complexity -->|No| Sticky[Use sticky sessions<br/>with stdio]
    
    HTTP --> Backend[Implement Redis/S3<br/>session store]
    Stdio --> Simple[Simple deployment]
    Sticky --> Fragile[Accept session loss<br/>on instance failure]
    
    style Fragile stroke:#ef4444,fill:#450a0a,color:#fca5a5
    
    classDef userAction fill:#1e3a8a,stroke:#60a5fa,color:#e0eaff
    classDef framework fill:#064e3b,stroke:#34d399,color:#6ee7b7
    
    class Start,Scale,Downtime,Complexity userAction
    class HTTP,Stdio,Sticky,Backend,Simple,Fragile framework

stdio transport suits desktop applications, CLI tools, and development environments. The process-local state model simplifies debugging and reduces infrastructure dependencies. Teams using stdio in production accept the constraint of single-instance deployments.

Streamable HTTP suits multi-tenant SaaS platforms, edge deployments, and agent workloads with variable traffic. The stateless model enables autoscaling without session affinity. Teams using Streamable HTTP must operate a state backend and monitor token serialization overhead.

The hybrid approach runs stdio behind sticky sessions. This defers the refactoring effort but limits scaling flexibility. The failure mode is binary: instance failure means session loss. Teams using this pattern should implement aggressive health checks and graceful shutdown to minimize disruption.

The Future of MCP in 2026: What This Means for Your Agent Stack

The SDK v2 specification positions MCP as a true distributed protocol. Session resumption and Streamable HTTP eliminate the last architectural barriers to running MCP at global scale. Agents can now span regions, fail over transparently, and scale to millions of concurrent sessions.

The implication here is broader than transport mechanics. MCP servers become stateless microservices. Tool handlers become pure functions of input parameters and session context. The entire agent stack becomes composable, testable, and independently deployable.

Developers building on MCP in 2026 should adopt Streamable HTTP as the default transport. The stdio pattern remains useful for local development but represents technical debt in production. Session resumption patterns should be implemented from day one—retrofitting state persistence into a live system is expensive and risky.

That covers the essential patterns for MCP SDK v2 architecture. Apply these in production and the difference will be immediate: agents that scale horizontally without sticky sessions, deploy without downtime, and maintain context continuity across infrastructure failures. The protocol now matches the operational reality of distributed systems.