Problem Statement
Design a URL shortening service like bit.ly that:
- Converts long URLs into short, unique identifiers
- Redirects users from short URLs to original URLs
- Handles billions of URLs
- Provides analytics on URL usage
- Ensures high availability and low latency
Requirements
Functional Requirements
- Generate a short and unique alias for a given URL
- Redirect users to the original URL when they access the short link
- Support custom short links
- Track analytics (clicks, geographic location, timestamps)
- Links should have configurable expiration times
Non-Functional Requirements
- High Availability: Service must be highly available (99.99% uptime)
- Low Latency: Redirects should happen in real-time with minimal latency
- Scalability: Support billions of URLs and millions of redirects per day
- Durability: Short links should not be lost once created
Extended Requirements
- REST API for integration
- Rate limiting to prevent abuse
- Analytics dashboard
Capacity Estimation
Traffic Estimates
- Reads: 100:1 read-to-write ratio
- Writes: 500 million new URLs per month = ~200 URLs/second
- Reads: 100 billion redirects per month = ~40K URLs/second
Storage Estimates
- URL Storage: Assume average URL size of 500 bytes
- Monthly Storage: 500 million URLs × 500 bytes = 250 GB/month
- 5-Year Storage: 250 GB × 12 × 5 = 15 TB
Bandwidth Estimates
- Write Bandwidth: 200 URLs/sec × 500 bytes = 100 KB/sec
- Read Bandwidth: 40K URLs/sec × 500 bytes = 20 MB/sec
System APIs
Create Short URL
def create_short_url(
api_key: str,
original_url: str,
custom_alias: Optional[str] = None,
expiration_date: Optional[datetime] = None
) -> str:
"""
Create a short URL for the given original URL.
Args:
api_key: API key for authentication
original_url: The long URL to be shortened
custom_alias: Optional custom short link
expiration_date: Optional expiration timestamp
Returns:
Short URL string (e.g., "https://short.ly/abc123")
"""
pass
Redirect Short URL
def get_original_url(
short_url: str
) -> str:
"""
Retrieve the original URL for redirection.
Args:
short_url: The shortened URL
Returns:
Original long URL
"""
pass
Delete Short URL
def delete_short_url(
api_key: str,
short_url: str
) -> bool:
"""
Delete a short URL.
Args:
api_key: API key for authentication
short_url: The shortened URL to delete
Returns:
Success status
"""
pass
Database Design
URL Table Schema
CREATE TABLE urls (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_key VARCHAR(10) UNIQUE NOT NULL,
original_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expiration_date TIMESTAMP,
custom_alias BOOLEAN DEFAULT FALSE,
INDEX idx_short_key (short_key),
INDEX idx_user_id (user_id)
);
Analytics Table Schema
CREATE TABLE url_analytics (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_key VARCHAR(10) NOT NULL,
accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
user_ip VARCHAR(45),
user_agent TEXT,
referrer TEXT,
country VARCHAR(2),
INDEX idx_short_key (short_key),
INDEX idx_accessed_at (accessed_at)
);
User Table Schema
CREATE TABLE users (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(255) UNIQUE NOT NULL,
api_key VARCHAR(64) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Key Generation Service
Base62 Encoding Approach
Use Base62 encoding [a-zA-Z0-9] to generate short keys:
import random
import string
class KeyGenerator:
def __init__(self):
self.charset = string.ascii_letters + string.digits # 62 characters
def generate_key(self, length: int = 7) -> str:
"""
Generate a random key of specified length.
7 characters gives us 62^7 = ~3.5 trillion combinations
"""
return ''.join(random.choices(self.charset, k=length))
def encode_id(self, id: int) -> str:
"""
Convert numeric ID to Base62 string.
"""
if id == 0:
return self.charset[0]
result = []
base = len(self.charset)
while id > 0:
result.append(self.charset[id % base])
id //= base
return ''.join(reversed(result))
Pre-generated Key Pool
For better performance, maintain a pool of pre-generated keys:
class KeyPoolService:
def __init__(self, pool_size: int = 1000000):
self.pool_size = pool_size
self.generator = KeyGenerator()
self.unused_keys = set()
self.used_keys = set()
def initialize_pool(self):
"""Pre-generate keys and store in database"""
while len(self.unused_keys) < self.pool_size:
key = self.generator.generate_key()
if key not in self.used_keys:
self.unused_keys.add(key)
def get_key(self) -> str:
"""Get an unused key from the pool"""
if len(self.unused_keys) < self.pool_size * 0.1:
# Trigger async refill when pool is 90% used
self.refill_pool_async()
key = self.unused_keys.pop()
self.used_keys.add(key)
return key
High-Level Design
Components
- Application Servers: Handle API requests
- Key Generation Service: Generate unique short keys
- Database: Store URL mappings (MySQL/PostgreSQL)
- Cache: Redis/Memcached for frequently accessed URLs
- Load Balancer: Distribute traffic across servers
- Analytics Service: Process and store click data
Architecture Diagram
┌─────────────────┐
│ Load Balancer │
└────────┬────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐
│ App │ │ App │ │ App │
│ Server │ │ Server │ │ Server │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└───────────────────┼───────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌────▼─────┐ ┌────▼─────┐ ┌─────▼──────┐
│ Cache │ │ DB │ │ Analytics │
│ (Redis) │ │ Master │ │ Service │
└──────────┘ └────┬─────┘ └────────────┘
│
┌──────┼──────┐
│ │ │
┌────▼──┐ ┌─▼───┐ ┌▼────┐
│ Slave │ │Slave│ │Slave│
└───────┘ └─────┘ └─────┘
Detailed Design
URL Shortening Flow
- Receive Request: User submits long URL
- Generate Key: Get unique key from Key Generation Service
- Check Collision: Verify key doesn’t exist (rare with good key length)
- Store Mapping: Save URL mapping to database
- Return Short URL: Respond with shortened URL
class URLShortenerService:
def __init__(self, db, cache, key_service):
self.db = db
self.cache = cache
self.key_service = key_service
def shorten_url(self, original_url: str, user_id: int = None,
custom_alias: str = None) -> str:
"""Shorten a URL"""
# Check if custom alias is available
if custom_alias:
if self.is_key_taken(custom_alias):
raise ValueError("Custom alias already taken")
short_key = custom_alias
else:
short_key = self.key_service.get_key()
# Store in database
url_data = {
'short_key': short_key,
'original_url': original_url,
'user_id': user_id,
'created_at': datetime.now()
}
self.db.insert('urls', url_data)
# Cache the mapping
self.cache.set(short_key, original_url, ttl=3600)
return f"https://short.ly/{short_key}"
URL Redirection Flow
- Receive Short URL: User clicks short link
- Check Cache: Look up URL in Redis cache
- Query Database: If cache miss, fetch from database
- Update Cache: Store result in cache
- Log Analytics: Async record click data
- Redirect: HTTP 301/302 redirect to original URL
def redirect_url(self, short_key: str, request_data: dict) -> str:
"""Redirect to original URL"""
# Try cache first (O(1) lookup)
original_url = self.cache.get(short_key)
if not original_url:
# Cache miss - query database
url_data = self.db.query(
"SELECT original_url FROM urls WHERE short_key = %s",
(short_key,)
)
if not url_data:
raise ValueError("URL not found")
original_url = url_data['original_url']
# Update cache
self.cache.set(short_key, original_url, ttl=3600)
# Log analytics asynchronously
self.log_analytics_async(short_key, request_data)
return original_url
Caching Strategy
Cache-Aside Pattern
- Cache frequently accessed URLs (80/20 rule)
- Use LRU eviction policy
- TTL: 1 hour for normal URLs
- Pre-warm cache for trending URLs
class CacheManager:
def __init__(self, redis_client):
self.redis = redis_client
self.default_ttl = 3600 # 1 hour
def get(self, key: str) -> Optional[str]:
"""Get URL from cache"""
return self.redis.get(f"url:{key}")
def set(self, key: str, value: str, ttl: int = None):
"""Set URL in cache"""
ttl = ttl or self.default_ttl
self.redis.setex(f"url:{key}", ttl, value)
def delete(self, key: str):
"""Remove URL from cache"""
self.redis.delete(f"url:{key}")
Scaling Considerations
Database Partitioning
- Partition Key: Hash of short_key
- Consistent Hashing: Distribute data across shards
- Replication: Master-slave for read scalability
Rate Limiting
class RateLimiter:
def __init__(self, redis_client):
self.redis = redis_client
def is_allowed(self, api_key: str, limit: int = 100,
window: int = 3600) -> bool:
"""
Check if request is within rate limit.
Using sliding window counter.
"""
current = int(time.time())
window_start = current - window
# Remove old entries
self.redis.zremrangebyscore(
f"rate:{api_key}",
0,
window_start
)
# Count requests in current window
request_count = self.redis.zcard(f"rate:{api_key}")
if request_count < limit:
# Add current request
self.redis.zadd(
f"rate:{api_key}",
{current: current}
)
self.redis.expire(f"rate:{api_key}", window)
return True
return False
Security Considerations
- API Keys: Authenticate users to prevent abuse
- Rate Limiting: Prevent DoS attacks
- Input Validation: Sanitize URLs to prevent injection
- HTTPS: Encrypt all communications
- Captcha: For public endpoints
Trade-offs and Alternatives
301 vs 302 Redirects
- 301 (Permanent): Cached by browsers, reduces server load but limits analytics
- 302 (Temporary): Better for analytics, slightly higher load
SQL vs NoSQL
- SQL: ACID compliance, complex queries for analytics
- NoSQL: Better horizontal scaling, eventual consistency
Key Generation Strategies
- Random Generation: Simple but requires collision checking
- Hashing: Deterministic but potential collisions
- Counter-based: Sequential, requires distributed counter
- Pre-generated Pool: Fast, requires storage and coordination
Summary
Key design decisions:
- Use Base62 encoding for short keys (7 characters = 3.5T combinations)
- Pre-generated key pool for performance
- Redis caching for 80% of redirects
- Async analytics processing to avoid blocking redirects
- Database partitioning for horizontal scaling
- Rate limiting for security
This design supports billions of URLs with high availability and low latency while providing comprehensive analytics capabilities.