jsmanifest logojsmanifest

Build a Rate Limiter in Node.js (Token Bucket Algorithm)

Build a Rate Limiter in Node.js (Token Bucket Algorithm)

Learn how to protect your API with a token bucket rate limiter in Node.js. Build it from scratch, add Express middleware, and scale with Redis for production.

Build a Rate Limiter in Node.js (Token Bucket Algorithm)

While I was looking over some production API logs the other day, I came across a pattern that made my stomach drop. A single client was hammering our endpoints with hundreds of requests per second, completely overwhelming our database and causing response times to spike for everyone else. Little did I know that this would lead me down the rabbit hole of rate limiting algorithms, and I'm glad it did.

I was once guilty of thinking rate limiting was something only massive companies needed to worry about. When I finally decided to implement proper rate limiting, I realized this wasn't just about preventing abuse—it was about protecting my infrastructure, ensuring fair access for all users, and preventing my AWS bill from exploding.

Why Rate Limiting Is Critical for API Protection

Let's be honest: the internet isn't always a friendly place. Without rate limiting, your API is vulnerable to:

  • DDoS attacks that can bring your entire service down
  • Resource exhaustion from poorly written client code making infinite loops
  • Cost explosions from excessive database queries or third-party API calls
  • Unfair usage where one user monopolizes shared resources

But here's the thing—not all rate limiting algorithms are created equal. I tried several approaches before discovering that the token bucket algorithm offered the perfect balance between strictness and flexibility.

Understanding the Token Bucket Algorithm

The token bucket algorithm is fascinating because it mimics a real-world concept. Imagine you have a bucket that holds tokens, and each API request costs one token. The bucket refills at a steady rate, but it has a maximum capacity.

Here's how it works:

  1. Initialize a bucket with a maximum capacity (e.g., 100 tokens)
  2. Refill the bucket at a fixed rate (e.g., 10 tokens per second)
  3. When a request arrives, check if there's at least one token available
  4. If yes, remove one token and allow the request
  5. If no, reject the request with a 429 status code

What makes this wonderful is the flexibility. Users can burst up to the bucket capacity when the bucket is full, but they're limited by the refill rate over time. This means legitimate users who occasionally spike aren't punished, while sustained abuse is effectively blocked.

Token Bucket Algorithm Visualization

Building a Basic In-Memory Token Bucket

Let me show you how I built my first token bucket implementation. This is a simple in-memory version that's perfect for understanding the core concept:

class TokenBucket {
  private tokens: number;
  private lastRefill: number;
  private readonly capacity: number;
  private readonly refillRate: number;
 
  constructor(capacity: number, refillRate: number) {
    this.capacity = capacity;
    this.refillRate = refillRate; // tokens per second
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }
 
  private refill(): void {
    const now = Date.now();
    const timePassed = (now - this.lastRefill) / 1000; // convert to seconds
    const tokensToAdd = timePassed * this.refillRate;
    
    this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
    this.lastRefill = now;
  }
 
  consume(tokens: number = 1): boolean {
    this.refill();
    
    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return true;
    }
    
    return false;
  }
 
  getAvailableTokens(): number {
    this.refill();
    return Math.floor(this.tokens);
  }
}
 
// Usage example
const bucket = new TokenBucket(100, 10); // 100 capacity, refills 10/second
 
if (bucket.consume()) {
  console.log('Request allowed!');
} else {
  console.log('Rate limit exceeded!');
}

This implementation taught me something crucial: the refill logic happens on-demand during each consume() call. In other words, we don't need a background timer constantly refilling the bucket. We calculate how many tokens should have been added based on the time elapsed since the last refill.

Creating Express Middleware for Rate Limiting

Now let's make this practical. I cannot stress this enough—middleware is the perfect place for rate limiting because it runs before your route handlers. Here's how I transformed that basic implementation into production-ready Express middleware:

import express, { Request, Response, NextFunction } from 'express';
 
class RateLimiter {
  private buckets: Map<string, TokenBucket>;
  private readonly capacity: number;
  private readonly refillRate: number;
 
  constructor(capacity: number, refillRate: number) {
    this.buckets = new Map();
    this.capacity = capacity;
    this.refillRate = refillRate;
  }
 
  private getClientId(req: Request): string {
    // Use user ID if authenticated, otherwise fall back to IP
    return req.user?.id || req.ip || 'unknown';
  }
 
  middleware() {
    return (req: Request, res: Response, next: NextFunction) => {
      const clientId = this.getClientId(req);
      
      // Get or create bucket for this client
      if (!this.buckets.has(clientId)) {
        this.buckets.set(clientId, new TokenBucket(this.capacity, this.refillRate));
      }
 
      const bucket = this.buckets.get(clientId)!;
      
      if (bucket.consume()) {
        // Add rate limit headers for transparency
        res.setHeader('X-RateLimit-Limit', this.capacity.toString());
        res.setHeader('X-RateLimit-Remaining', bucket.getAvailableTokens().toString());
        next();
      } else {
        res.setHeader('X-RateLimit-Limit', this.capacity.toString());
        res.setHeader('X-RateLimit-Remaining', '0');
        res.setHeader('Retry-After', '1'); // seconds
        res.status(429).json({
          error: 'Too many requests',
          message: 'Rate limit exceeded. Please try again later.'
        });
      }
    };
  }
 
  // Cleanup old buckets to prevent memory leaks
  cleanup(): void {
    const threshold = Date.now() - (60 * 60 * 1000); // 1 hour ago
    for (const [clientId, bucket] of this.buckets.entries()) {
      if (bucket['lastRefill'] < threshold) {
        this.buckets.delete(clientId);
      }
    }
  }
}
 
// Express app setup
const app = express();
const limiter = new RateLimiter(100, 10);
 
// Apply rate limiting to all routes
app.use(limiter.middleware());
 
// Or apply to specific routes
app.get('/api/expensive', limiter.middleware(), (req, res) => {
  res.json({ data: 'This endpoint is protected' });
});
 
// Run cleanup every hour
setInterval(() => limiter.cleanup(), 60 * 60 * 1000);

Notice how I added rate limit headers? This was a game-changer for debugging. Clients can see exactly how many requests they have remaining, which dramatically reduced support tickets.

Express Middleware Rate Limiting

Scaling with Redis: Distributed Rate Limiting

Here's where things got interesting. The in-memory approach worked wonderfully for a single server, but when I scaled to multiple instances behind a load balancer, I realized each server was tracking limits independently. Luckily we can use Redis to share state across all servers:

import Redis from 'ioredis';
 
class DistributedTokenBucket {
  private redis: Redis;
  private readonly keyPrefix: string;
  private readonly capacity: number;
  private readonly refillRate: number;
 
  constructor(redis: Redis, capacity: number, refillRate: number) {
    this.redis = redis;
    this.keyPrefix = 'rate-limit:';
    this.capacity = capacity;
    this.refillRate = refillRate;
  }
 
  async consume(clientId: string, tokens: number = 1): Promise<boolean> {
    const key = `${this.keyPrefix}${clientId}`;
    const now = Date.now();
 
    // Lua script ensures atomic operations
    const script = `
      local key = KEYS[1]
      local capacity = tonumber(ARGV[1])
      local refillRate = tonumber(ARGV[2])
      local tokens = tonumber(ARGV[3])
      local now = tonumber(ARGV[4])
 
      local bucket = redis.call('HMGET', key, 'tokens', 'lastRefill')
      local availableTokens = tonumber(bucket[1]) or capacity
      local lastRefill = tonumber(bucket[2]) or now
 
      -- Calculate refill
      local timePassed = (now - lastRefill) / 1000
      local tokensToAdd = timePassed * refillRate
      availableTokens = math.min(capacity, availableTokens + tokensToAdd)
 
      -- Try to consume
      if availableTokens >= tokens then
        availableTokens = availableTokens - tokens
        redis.call('HMSET', key, 'tokens', availableTokens, 'lastRefill', now)
        redis.call('EXPIRE', key, 3600) -- expire after 1 hour
        return {1, math.floor(availableTokens)}
      else
        return {0, math.floor(availableTokens)}
      end
    `;
 
    const result = await this.redis.eval(
      script,
      1,
      key,
      this.capacity,
      this.refillRate,
      tokens,
      now
    ) as [number, number];
 
    return result[0] === 1;
  }
}

Using Lua scripts was crucial because it guarantees atomicity. The refill calculation and token consumption happen as a single operation, preventing race conditions when multiple servers access the same client's bucket simultaneously.

Token Bucket vs Leaky Bucket vs Fixed Window

When I was researching rate limiting algorithms, I got confused by all the options. Let me break down what I learned:

Token Bucket (what we built):

  • Allows bursts up to bucket capacity
  • Smooth refill rate over time
  • Best for APIs where occasional spikes are acceptable

Leaky Bucket:

  • Processes requests at a constant rate
  • Requests queue up when they exceed the rate
  • Best when you want perfectly smooth output

Fixed Window:

  • Simple counter that resets every time window
  • Vulnerable to burst attacks at window boundaries
  • Easiest to implement but least sophisticated

I chose token bucket because it balances flexibility with protection. Users can burst when they need to, but sustained abuse is still blocked.

Advanced Patterns: Per-Endpoint and Per-User Limits

In production, I realized different endpoints need different limits. Here's how I implemented tiered rate limiting:

interface RateLimitConfig {
  capacity: number;
  refillRate: number;
}
 
const rateLimits: Record<string, RateLimitConfig> = {
  '/api/search': { capacity: 20, refillRate: 2 },     // expensive
  '/api/users': { capacity: 100, refillRate: 10 },     // moderate
  '/api/health': { capacity: 1000, refillRate: 100 },  // cheap
};
 
function createSmartLimiter() {
  return (req: Request, res: Response, next: NextFunction) => {
    const config = rateLimits[req.path] || { capacity: 50, refillRate: 5 };
    const limiter = new RateLimiter(config.capacity, config.refillRate);
    return limiter.middleware()(req, res, next);
  };
}

I also implemented premium tiers where paid users get higher limits. This turned rate limiting from a protective measure into a monetization strategy.

Production Considerations and Testing Strategies

Before deploying to production, I learned several hard lessons:

Memory Management: In-memory buckets can leak if you don't clean them up. Set up periodic cleanup or use Redis with expiry.

Header Transparency: Always include X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After headers. Your future self will thank you.

Testing: I wrote tests that verified burst allowance, sustained rate limits, and proper rejection. Use setTimeout to test time-based refills.

Monitoring: Track rate limit rejections in your metrics. Sudden spikes often indicate either abuse or a legitimate client with a bug.

Graceful Degradation: When Redis is down, fail open (allow requests) rather than fail closed. Availability beats perfect rate limiting.

And that concludes the end of this post! I hope you found this valuable and look out for more in the future!