Rate Limiting
Core Concept
Controlling the rate of requests to protect systems from overload and ensure fair usage
Rate Limiting
Overview
Rate limiting is a technique for controlling the number of requests a user, service, or system can make to an API or resource within a specific time period. It serves as a critical defense mechanism to prevent system overload, ensure fair resource allocation, and protect against malicious attacks while maintaining service quality for legitimate users.
Implemented by virtually every major service including Twitter, GitHub, Stripe, and AWS, rate limiting is essential for building scalable and secure systems. It helps prevent abuse, manages costs, ensures SLA compliance, and provides predictable performance characteristics under varying load conditions.
The main technical challenges this addresses include:
- Resource protection: Preventing system overload and maintaining performance
- Fair usage: Ensuring equitable access to shared resources across users
- Attack mitigation: Defending against DDoS, brute force, and scraping attacks
- Cost control: Managing operational costs in usage-based pricing models
Core Principles: Rate Limiting Algorithms
Token Bucket Algorithm
The token bucket algorithm maintains a bucket that holds tokens, with tokens being added at a constant rate. Each request consumes one or more tokens, and requests are allowed only if sufficient tokens are available.
class TokenBucket {
constructor(capacity, refillRate, refillPeriod = 1000) {
this.capacity = capacity; // Maximum tokens in bucket
this.tokens = capacity; // Current token count
this.refillRate = refillRate; // Tokens added per period
this.refillPeriod = refillPeriod; // Period in milliseconds
this.lastRefill = Date.now();
}
consume(tokensRequested = 1) {
this.refill();
if (this.tokens >= tokensRequested) {
this.tokens -= tokensRequested;
return {
allowed: true,
remainingTokens: this.tokens,
retryAfter: null
};
}
// Calculate when enough tokens will be available
const tokensNeeded = tokensRequested - this.tokens;
const timeToRefill = (tokensNeeded / this.refillRate) * this.refillPeriod;
return {
allowed: false,
remainingTokens: this.tokens,
retryAfter: Math.ceil(timeToRefill / 1000) // seconds
};
}
refill() {
const now = Date.now();
const timePassed = now - this.lastRefill;
if (timePassed >= this.refillPeriod) {
const periodsElapsed = Math.floor(timePassed / this.refillPeriod);
const tokensToAdd = periodsElapsed * this.refillRate;
this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
this.lastRefill = now;
}
}
peek() {
this.refill();
return {
tokens: this.tokens,
capacity: this.capacity,
nextRefill: this.lastRefill + this.refillPeriod
};
}
}
// Usage example
const apiLimiter = new TokenBucket(
100, // 100 requests capacity
50, // 50 requests per second refill rate
1000 // 1 second refill period
);
// Handle API request
function handleAPIRequest(req, res) {
const result = apiLimiter.consume(1);
if (!result.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: result.retryAfter
});
}
// Process request
res.json({ data: 'Success', remaining: result.remainingTokens });
}
Sliding Window Log Algorithm
class SlidingWindowLog {
constructor(limit, windowSizeMs) {
this.limit = limit;
this.windowSizeMs = windowSizeMs;
this.requests = [];
}
isAllowed(userId = 'default') {
const now = Date.now();
const windowStart = now - this.windowSizeMs;
// Clean old requests outside the window
this.requests = this.requests.filter(req =>
req.timestamp > windowStart && req.userId === userId
);
if (this.requests.length < this.limit) {
this.requests.push({
userId,
timestamp: now
});
return {
allowed: true,
count: this.requests.length,
resetTime: windowStart + this.windowSizeMs
};
}
// Find when the oldest request will expire
const oldestRequest = Math.min(...this.requests.map(r => r.timestamp));
const resetTime = oldestRequest + this.windowSizeMs;
return {
allowed: false,
count: this.requests.length,
resetTime,
retryAfter: Math.ceil((resetTime - now) / 1000)
};
}
getStatus(userId = 'default') {
const now = Date.now();
const windowStart = now - this.windowSizeMs;
const userRequests = this.requests.filter(req =>
req.timestamp > windowStart && req.userId === userId
);
return {
count: userRequests.length,
limit: this.limit,
remaining: Math.max(0, this.limit - userRequests.length),
resetTime: windowStart + this.windowSizeMs
};
}
}
Fixed Window Counter
class FixedWindowCounter {
constructor(limit, windowSizeMs) {
this.limit = limit;
this.windowSizeMs = windowSizeMs;
this.windows = new Map(); // userId -> { count, windowStart }
}
isAllowed(userId = 'default') {
const now = Date.now();
const currentWindow = Math.floor(now / this.windowSizeMs);
const userWindow = this.windows.get(userId);
if (!userWindow || userWindow.windowStart !== currentWindow) {
// New window or user
this.windows.set(userId, {
count: 1,
windowStart: currentWindow
});
return {
allowed: true,
count: 1,
remaining: this.limit - 1,
resetTime: (currentWindow + 1) * this.windowSizeMs
};
}
if (userWindow.count < this.limit) {
userWindow.count++;
return {
allowed: true,
count: userWindow.count,
remaining: this.limit - userWindow.count,
resetTime: (currentWindow + 1) * this.windowSizeMs
};
}
return {
allowed: false,
count: userWindow.count,
remaining: 0,
resetTime: (currentWindow + 1) * this.windowSizeMs,
retryAfter: Math.ceil(((currentWindow + 1) * this.windowSizeMs - now) / 1000)
};
}
// Cleanup old windows periodically
cleanup() {
const now = Date.now();
const currentWindow = Math.floor(now / this.windowSizeMs);
for (const [userId, window] of this.windows.entries()) {
if (window.windowStart < currentWindow - 1) {
this.windows.delete(userId);
}
}
}
}
Sliding Window Counter (Hybrid)
class SlidingWindowCounter {
constructor(limit, windowSizeMs) {
this.limit = limit;
this.windowSizeMs = windowSizeMs;
this.counters = new Map(); // userId -> { current, previous, windowStart }
}
isAllowed(userId = 'default') {
const now = Date.now();
const currentWindow = Math.floor(now / this.windowSizeMs);
const windowProgress = (now % this.windowSizeMs) / this.windowSizeMs;
let userCounter = this.counters.get(userId);
if (!userCounter) {
userCounter = {
current: 0,
previous: 0,
windowStart: currentWindow
};
this.counters.set(userId, userCounter);
}
// Check if we've moved to a new window
if (userCounter.windowStart < currentWindow) {
const windowsElapsed = currentWindow - userCounter.windowStart;
if (windowsElapsed === 1) {
// Moved to next window
userCounter.previous = userCounter.current;
userCounter.current = 0;
} else {
// Skipped windows (user was inactive)
userCounter.previous = 0;
userCounter.current = 0;
}
userCounter.windowStart = currentWindow;
}
// Calculate weighted count based on window position
const weightedCount = userCounter.previous * (1 - windowProgress) + userCounter.current;
if (weightedCount < this.limit) {
userCounter.current++;
return {
allowed: true,
count: Math.ceil(weightedCount + 1),
remaining: Math.floor(this.limit - weightedCount - 1),
resetTime: (currentWindow + 1) * this.windowSizeMs
};
}
return {
allowed: false,
count: Math.ceil(weightedCount),
remaining: 0,
resetTime: (currentWindow + 1) * this.windowSizeMs,
retryAfter: Math.ceil(((currentWindow + 1) * this.windowSizeMs - now) / 1000)
};
}
}
Algorithm Comparison Visualization
System Architecture Diagram
Production Implementation Patterns
Distributed Rate Limiting with Redis
class DistributedRateLimiter {
constructor(redis, defaultConfig = {}) {
this.redis = redis;
this.defaultLimit = defaultConfig.limit || 100;
this.defaultWindow = defaultConfig.window || 3600000; // 1 hour
this.algorithm = defaultConfig.algorithm || 'sliding_window_counter';
}
async checkLimit(key, options = {}) {
const config = {
limit: options.limit || this.defaultLimit,
window: options.window || this.defaultWindow,
algorithm: options.algorithm || this.algorithm
};
switch (config.algorithm) {
case 'token_bucket':
return await this.tokenBucketCheck(key, config);
case 'sliding_window_counter':
return await this.slidingWindowCounterCheck(key, config);
case 'fixed_window':
return await this.fixedWindowCheck(key, config);
default:
throw new Error(`Unknown algorithm: ${config.algorithm}`);
}
}
async tokenBucketCheck(key, config) {
const script = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local refill_period = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
local now = tonumber(ARGV[5])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Refill tokens
local time_passed = math.max(0, now - last_refill)
local periods_elapsed = math.floor(time_passed / refill_period)
if periods_elapsed > 0 then
local tokens_to_add = periods_elapsed * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)
last_refill = last_refill + (periods_elapsed * refill_period)
end
-- Check if request can be fulfilled
if tokens >= requested then
tokens = tokens - requested
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)
redis.call('EXPIRE', key, math.ceil(refill_period / 1000) * 2)
return {1, tokens, 0} -- allowed, remaining, retry_after
else
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)
redis.call('EXPIRE', key, math.ceil(refill_period / 1000) * 2)
local tokens_needed = requested - tokens
local retry_after = math.ceil((tokens_needed / refill_rate) * refill_period / 1000)
return {0, tokens, retry_after} -- not allowed, remaining, retry_after
end
`;
const result = await this.redis.eval(
script,
1,
key,
config.limit, // capacity
config.limit, // refill_rate (same as capacity for 1:1 refill)
config.window, // refill_period
1, // requested tokens
Date.now()
);
return {
allowed: result[0] === 1,
remaining: result[1],
retryAfter: result[2],
resetTime: null
};
}
async slidingWindowCounterCheck(key, config) {
const script = `
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local current_window = math.floor(now / window)
local previous_window = current_window - 1
local window_progress = (now % window) / window
local current_key = key .. ':' .. current_window
local previous_key = key .. ':' .. previous_window
local current_count = tonumber(redis.call('GET', current_key)) or 0
local previous_count = tonumber(redis.call('GET', previous_key)) or 0
local weighted_count = previous_count * (1 - window_progress) + current_count
if weighted_count < limit then
redis.call('INCR', current_key)
redis.call('EXPIRE', current_key, math.ceil(window / 1000) * 2)
return {1, math.floor(limit - weighted_count - 1)} -- allowed, remaining
else
return {0, 0} -- not allowed, remaining
end
`;
const result = await this.redis.eval(
script,
1,
key,
config.limit,
config.window,
Date.now()
);
const resetTime = Math.ceil(Date.now() / config.window + 1) * config.window;
return {
allowed: result[0] === 1,
remaining: result[1],
retryAfter: result[0] === 0 ? Math.ceil((resetTime - Date.now()) / 1000) : null,
resetTime
};
}
async fixedWindowCheck(key, config) {
const windowStart = Math.floor(Date.now() / config.window) * config.window;
const windowKey = `${key}:${windowStart}`;
const current = await this.redis.incr(windowKey);
if (current === 1) {
// First request in this window, set expiration
await this.redis.expire(windowKey, Math.ceil(config.window / 1000));
}
const resetTime = windowStart + config.window;
if (current <= config.limit) {
return {
allowed: true,
remaining: config.limit - current,
retryAfter: null,
resetTime
};
} else {
return {
allowed: false,
remaining: 0,
retryAfter: Math.ceil((resetTime - Date.now()) / 1000),
resetTime
};
}
}
}
Hierarchical Rate Limiting
class HierarchicalRateLimiter {
constructor(rateLimiter) {
this.limiter = rateLimiter;
this.limits = new Map();
}
defineLimit(name, config) {
this.limits.set(name, config);
}
async checkLimits(keys, options = {}) {
const results = [];
// Check limits in order (most restrictive first)
for (const keyConfig of keys) {
const { key, limitName, weight = 1 } = keyConfig;
const limitConfig = this.limits.get(limitName);
if (!limitConfig) {
throw new Error(`Unknown limit: ${limitName}`);
}
const result = await this.limiter.checkLimit(key, {
...limitConfig,
...options
});
result.limitName = limitName;
result.key = key;
results.push(result);
if (!result.allowed) {
// Return early if any limit is exceeded
return {
allowed: false,
limitExceeded: limitName,
results,
retryAfter: result.retryAfter
};
}
}
return {
allowed: true,
results
};
}
}
// Usage example
const hierarchicalLimiter = new HierarchicalRateLimiter(
new DistributedRateLimiter(redis)
);
// Define different limit types
hierarchicalLimiter.defineLimit('per_user', {
limit: 100,
window: 60000, // 1 minute
algorithm: 'sliding_window_counter'
});
hierarchicalLimiter.defineLimit('per_api_key', {
limit: 1000,
window: 60000
});
hierarchicalLimiter.defineLimit('global', {
limit: 10000,
window: 60000
});
// Check multiple limits for a request
async function handleAPIRequest(req, res) {
const userId = req.user.id;
const apiKey = req.headers['x-api-key'];
const limitCheck = await hierarchicalLimiter.checkLimits([
{ key: `user:${userId}`, limitName: 'per_user' },
{ key: `api_key:${apiKey}`, limitName: 'per_api_key' },
{ key: 'global', limitName: 'global' }
]);
if (!limitCheck.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
limit: limitCheck.limitExceeded,
retryAfter: limitCheck.retryAfter
});
}
// Process request
res.json({ data: 'Success' });
}
Advanced Rate Limiting Patterns
class AdaptiveRateLimiter {
constructor(redis) {
this.redis = redis;
this.baseLimiter = new DistributedRateLimiter(redis);
this.healthThreshold = 0.8; // Reduce limits when system is 80% loaded
}
async checkLimitWithAdaptation(key, baseConfig, context = {}) {
// Get system health metrics
const systemHealth = await this.getSystemHealth();
// Adjust limits based on system health
const adaptedConfig = this.adaptLimitsForHealth(baseConfig, systemHealth);
// Adjust for user reputation
if (context.userReputation) {
adaptedConfig.limit *= this.getReputationMultiplier(context.userReputation);
}
// Adjust for request cost
if (context.operationCost) {
const costMultiplier = Math.max(1, context.operationCost);
return await this.baseLimiter.checkLimit(key, {
...adaptedConfig,
limit: Math.floor(adaptedConfig.limit / costMultiplier)
});
}
return await this.baseLimiter.checkLimit(key, adaptedConfig);
}
adaptLimitsForHealth(config, health) {
if (health.cpu > this.healthThreshold || health.memory > this.healthThreshold) {
// Reduce limits when system is under stress
const reductionFactor = 1 - ((health.cpu + health.memory) / 2 - this.healthThreshold) / (1 - this.healthThreshold) * 0.5;
return {
...config,
limit: Math.floor(config.limit * Math.max(0.1, reductionFactor))
};
}
return config;
}
getReputationMultiplier(reputation) {
// Higher reputation users get higher limits
if (reputation >= 0.9) return 2.0;
if (reputation >= 0.7) return 1.5;
if (reputation >= 0.5) return 1.0;
if (reputation >= 0.3) return 0.7;
return 0.5; // Low reputation users get reduced limits
}
async getSystemHealth() {
// This would integrate with your monitoring system
return {
cpu: Math.random() * 0.9, // Mock CPU usage
memory: Math.random() * 0.8, // Mock memory usage
latency: Math.random() * 100 // Mock average latency
};
}
}
// Circuit breaker integration
class RateLimitedCircuitBreaker {
constructor(rateLimiter, circuitBreaker) {
this.rateLimiter = rateLimiter;
this.circuitBreaker = circuitBreaker;
}
async execute(operation, context) {
// Check rate limit first
const limitResult = await this.rateLimiter.checkLimit(
context.key,
context.limitConfig
);
if (!limitResult.allowed) {
throw new Error(`Rate limit exceeded: ${limitResult.retryAfter}s`);
}
// Then check circuit breaker
return await this.circuitBreaker.execute(operation);
}
}
Deep Dive Analysis
Algorithm Performance Comparison
Algorithm | Memory | Precision | Burst Handling | Implementation |
---|---|---|---|---|
Token Bucket | O(1) per key | High | Excellent | Complex |
Sliding Window Log | O(N) per key | Perfect | Good | Simple |
Fixed Window | O(1) per key | Low | Poor | Very Simple |
Sliding Window Counter | O(1) per key | Good | Good | Moderate |
Rate Limiting Strategies by Use Case
API Rate Limiting:
- Algorithm: Sliding Window Counter
- Granularity: Per user + per API key + global
- Time Windows: Multiple (per minute, per hour, per day)
- Special Handling: Burst allowance for premium users
DDoS Protection:
- Algorithm: Token Bucket (fast rejection)
- Granularity: Per IP + per region
- Time Windows: Short (seconds to minutes)
- Special Handling: Aggressive limiting for suspicious patterns
Resource Management:
- Algorithm: Fixed Window (predictable resource usage)
- Granularity: Per service + per operation type
- Time Windows: Medium (minutes to hours)
- Special Handling: Priority queues for different request types
Common Pitfalls and Solutions
1. Thundering Herd at Window Boundaries
// Problem: All requests reset at same time in fixed windows
function badFixedWindow(userId) {
const windowStart = Math.floor(Date.now() / 60000) * 60000; // Minute boundary
return checkLimit(`user:${userId}:${windowStart}`);
}
// Solution: Stagger windows or use sliding window
function goodSlidingWindow(userId) {
return slidingWindowCounter.isAllowed(userId);
}
// Alternative: Random jitter for window start
function jitteredWindow(userId) {
const jitter = hash(userId) % 60000; // User-specific jitter
const windowStart = Math.floor((Date.now() - jitter) / 60000) * 60000 + jitter;
return checkLimit(`user:${userId}:${windowStart}`);
}
2. Distributed Counter Inconsistency
// Problem: Race conditions in distributed counters
class BrokenDistributedCounter {
async increment(key) {
const current = await this.redis.get(key);
const newValue = (current || 0) + 1;
await this.redis.set(key, newValue); // Race condition!
return newValue;
}
}
// Solution: Use atomic operations
class AtomicDistributedCounter {
async increment(key) {
return await this.redis.incr(key); // Atomic operation
}
async incrementWithLimit(key, limit, ttl) {
const script = `
local current = redis.call('INCR', KEYS[1])
if current == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[2])
end
if current <= tonumber(ARGV[1]) then
return {1, current}
else
return {0, current}
end
`;
return await this.redis.eval(script, 1, key, limit, ttl);
}
}
3. Memory Leaks in Long-Running Processes
// Problem: In-memory rate limiters growing indefinitely
class LeakyRateLimiter {
constructor() {
this.counters = new Map(); // Never cleaned up!
}
}
// Solution: Implement cleanup mechanisms
class ManagedRateLimiter {
constructor() {
this.counters = new Map();
this.lastCleanup = Date.now();
this.cleanupInterval = 60000; // 1 minute
}
checkLimit(key) {
this.maybeCleanup();
// ... rate limiting logic
}
maybeCleanup() {
if (Date.now() - this.lastCleanup > this.cleanupInterval) {
this.cleanup();
this.lastCleanup = Date.now();
}
}
cleanup() {
const now = Date.now();
for (const [key, data] of this.counters.entries()) {
if (now - data.lastAccess > this.cleanupInterval * 2) {
this.counters.delete(key);
}
}
}
}
Interview-Focused Content
Junior Level (2-4 YOE)
Q: What is rate limiting and why is it important? A: Rate limiting controls how many requests a user can make within a time period. It's important to prevent system overload, ensure fair usage among users, protect against attacks like DDoS, and maintain service quality. For example, Twitter limits API calls to prevent abuse.
Q: Explain the difference between Token Bucket and Fixed Window algorithms. A:
- Token Bucket: Maintains a bucket of tokens that refill at a constant rate. Allows burst traffic up to bucket capacity but smooths out the rate over time.
- Fixed Window: Counts requests in fixed time periods (e.g., per minute). Simple but can allow traffic spikes at window boundaries.
Q: How would you implement a simple rate limiter for an API? A: Use a key-value store (like Redis) to track request counts per user per time window. For each request: 1) Check current count for user+window, 2) If under limit, increment and allow, 3) If over limit, reject with 429 status and retry-after header.
Senior Level (5-8 YOE)
Q: How do you implement rate limiting in a distributed system? A: Use a shared state store (Redis) with atomic operations to ensure consistency across instances. Consider:
- Lua scripts for atomic read-modify-write operations
- Consistent hashing for partitioning rate limit data
- Local caching with periodic synchronization for better performance
- Circuit breakers to handle rate limiter failures
Q: What are the challenges of implementing hierarchical rate limiting? A: Challenges include:
- Ordering: Which limits to check first and how to handle dependencies
- Consistency: Ensuring atomic checks across multiple limits
- Performance: Minimizing latency when checking multiple limits
- Fairness: Balancing between different limit types and user tiers
- Configuration: Managing complex limit hierarchies and inheritance
Q: How do you handle rate limiting for different types of operations with varying costs? A: Implement weighted rate limiting:
- Cost-based: Assign weights to operations (simple read=1, complex search=10)
- Resource-aware: Consider CPU, memory, or I/O requirements
- Dynamic adjustment: Adapt costs based on current system load
- User tiers: Different limits for free vs premium users
- Operation classification: Group similar operations for simplified management
Staff+ Level (8+ YOE)
Q: Design a rate limiting system for a global CDN serving millions of requests per second. A: Multi-layered approach:
- Edge-level: Local rate limiting at CDN edges with bloom filters for memory efficiency
- Regional: Aggregated limits across edge nodes within regions
- Global: Eventually consistent global limits with acceptable drift
- Adaptive: Dynamic limit adjustment based on traffic patterns and system health
- Hierarchical: User, origin, and global limits with priority handling
- Monitoring: Real-time visibility into limit usage and system performance
Q: How would you migrate from a simple rate limiter to a sophisticated multi-algorithm system? A: Migration strategy:
- Feature flags: Control rollout and enable easy rollback
- A/B testing: Compare old vs new algorithms with production traffic
- Backward compatibility: Support existing client integrations
- Gradual migration: Start with non-critical endpoints
- Monitoring: Comprehensive metrics to detect issues early
- Failover: Fallback to simple algorithm if sophisticated one fails
Q: What are the implications of rate limiting for system observability and business metrics? A: Key considerations:
- Business impact: Track revenue impact of rate limiting decisions
- User experience: Monitor user frustration and churn due to limits
- System health: Correlation between rate limiting and system performance
- Abuse detection: Identify patterns indicating malicious behavior
- Capacity planning: Use rate limiting data for infrastructure scaling
- SLA compliance: Ensure rate limiting doesn't violate service agreements