Topics
System Designmedium

Design a Rate Limiter

Token bucket vs sliding window — the classic API gateway problem with a mermaid block you can sketch in 15 minutes.

Rate limiters protect your API from abuse, runaway clients, and accidental DDoS from a buggy cron job. Every system design interview eventually asks: how do you throttle fairly at scale?

Requirements (say these first)

High-level architecture

Algorithm trade-offs

AlgorithmProsCons
Fixed windowSimple, fastBurst at window boundary
Sliding window logAccurateMemory per request
Token bucketSmooth burstsSlightly more state
Sliding window counterGood balanceApproximation at edges

Token bucket (common interview answer)

python
class TokenBucket:
    def __init__(self, capacity: int, refill_per_sec: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_per_sec = refill_per_sec
        self.last_refill = time.monotonic()

    def allow(self) -> bool:
        now = time.monotonic()
        elapsed = now - self.last_refill
        self.tokens = min(
            self.capacity,
            self.tokens + elapsed * self.refill_per_sec,
        )
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Distributed gotcha

Each gateway instance can't keep its own counter — a user could hit 1000 req/s by fanning across 10 nodes. Centralize state in Redis (or use a gossip/coordinated counter with careful consistency trade-offs).

What to say in the room

  1. Clarify who is limited (user, IP, API key) and what counts (read vs write)
  2. Pick an algorithm and name its weakness
  3. Put Redis on the diagram — interviewers want to see shared state
  4. Mention Retry-After header and idempotency for write endpoints

The diagram + one algorithm + Redis is usually enough for a 35-minute system design slot.