URL Shortener

Learn how to design a scalable URL shortening service with high availability, low latency, and analytics capabilities.

Problem Statement

Design a URL shortening service like bit.ly that:

  • Converts long URLs into short, unique identifiers
  • Redirects users from short URLs to original URLs
  • Handles billions of URLs
  • Provides analytics on URL usage
  • Ensures high availability and low latency

Requirements

Functional Requirements

  1. Generate a short and unique alias for a given URL
  2. Redirect users to the original URL when they access the short link
  3. Support custom short links
  4. Track analytics (clicks, geographic location, timestamps)
  5. Links should have configurable expiration times

Non-Functional Requirements

  1. High Availability: Service must be highly available (99.99% uptime)
  2. Low Latency: Redirects should happen in real-time with minimal latency
  3. Scalability: Support billions of URLs and millions of redirects per day
  4. Durability: Short links should not be lost once created

Extended Requirements

  • REST API for integration
  • Rate limiting to prevent abuse
  • Analytics dashboard

Capacity Estimation

Traffic Estimates

  • Reads: 100:1 read-to-write ratio
  • Writes: 500 million new URLs per month = ~200 URLs/second
  • Reads: 100 billion redirects per month = ~40K URLs/second

Storage Estimates

  • URL Storage: Assume average URL size of 500 bytes
  • Monthly Storage: 500 million URLs × 500 bytes = 250 GB/month
  • 5-Year Storage: 250 GB × 12 × 5 = 15 TB

Bandwidth Estimates

  • Write Bandwidth: 200 URLs/sec × 500 bytes = 100 KB/sec
  • Read Bandwidth: 40K URLs/sec × 500 bytes = 20 MB/sec

System APIs

Create Short URL

def create_short_url(
    api_key: str,
    original_url: str,
    custom_alias: Optional[str] = None,
    expiration_date: Optional[datetime] = None
) -> str:
    """
    Create a short URL for the given original URL.
    
    Args:
        api_key: API key for authentication
        original_url: The long URL to be shortened
        custom_alias: Optional custom short link
        expiration_date: Optional expiration timestamp
        
    Returns:
        Short URL string (e.g., "https://short.ly/abc123")
    """
    pass

Redirect Short URL

def get_original_url(
    short_url: str
) -> str:
    """
    Retrieve the original URL for redirection.
    
    Args:
        short_url: The shortened URL
        
    Returns:
        Original long URL
    """
    pass

Delete Short URL

def delete_short_url(
    api_key: str,
    short_url: str
) -> bool:
    """
    Delete a short URL.
    
    Args:
        api_key: API key for authentication
        short_url: The shortened URL to delete
        
    Returns:
        Success status
    """
    pass

Database Design

URL Table Schema

CREATE TABLE urls (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_key VARCHAR(10) UNIQUE NOT NULL,
    original_url TEXT NOT NULL,
    user_id BIGINT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expiration_date TIMESTAMP,
    custom_alias BOOLEAN DEFAULT FALSE,
    INDEX idx_short_key (short_key),
    INDEX idx_user_id (user_id)
);

Analytics Table Schema

CREATE TABLE url_analytics (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_key VARCHAR(10) NOT NULL,
    accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    user_ip VARCHAR(45),
    user_agent TEXT,
    referrer TEXT,
    country VARCHAR(2),
    INDEX idx_short_key (short_key),
    INDEX idx_accessed_at (accessed_at)
);

User Table Schema

CREATE TABLE users (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    email VARCHAR(255) UNIQUE NOT NULL,
    api_key VARCHAR(64) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Key Generation Service

Base62 Encoding Approach

Use Base62 encoding [a-zA-Z0-9] to generate short keys:

import random
import string

class KeyGenerator:
    def __init__(self):
        self.charset = string.ascii_letters + string.digits  # 62 characters
        
    def generate_key(self, length: int = 7) -> str:
        """
        Generate a random key of specified length.
        7 characters gives us 62^7 = ~3.5 trillion combinations
        """
        return ''.join(random.choices(self.charset, k=length))
    
    def encode_id(self, id: int) -> str:
        """
        Convert numeric ID to Base62 string.
        """
        if id == 0:
            return self.charset[0]
        
        result = []
        base = len(self.charset)
        
        while id > 0:
            result.append(self.charset[id % base])
            id //= base
            
        return ''.join(reversed(result))

Pre-generated Key Pool

For better performance, maintain a pool of pre-generated keys:

class KeyPoolService:
    def __init__(self, pool_size: int = 1000000):
        self.pool_size = pool_size
        self.generator = KeyGenerator()
        self.unused_keys = set()
        self.used_keys = set()
        
    def initialize_pool(self):
        """Pre-generate keys and store in database"""
        while len(self.unused_keys) < self.pool_size:
            key = self.generator.generate_key()
            if key not in self.used_keys:
                self.unused_keys.add(key)
    
    def get_key(self) -> str:
        """Get an unused key from the pool"""
        if len(self.unused_keys) < self.pool_size * 0.1:
            # Trigger async refill when pool is 90% used
            self.refill_pool_async()
        
        key = self.unused_keys.pop()
        self.used_keys.add(key)
        return key

High-Level Design

Components

  1. Application Servers: Handle API requests
  2. Key Generation Service: Generate unique short keys
  3. Database: Store URL mappings (MySQL/PostgreSQL)
  4. Cache: Redis/Memcached for frequently accessed URLs
  5. Load Balancer: Distribute traffic across servers
  6. Analytics Service: Process and store click data

Architecture Diagram

                    ┌─────────────────┐
                    │  Load Balancer  │
                    └────────┬────────┘
                             │
        ┌────────────────────┼────────────────────┐
        │                    │                    │
   ┌────▼─────┐        ┌────▼─────┐        ┌────▼─────┐
   │  App     │        │  App     │        │  App     │
   │  Server  │        │  Server  │        │  Server  │
   └────┬─────┘        └────┬─────┘        └────┬─────┘
        │                   │                   │
        └───────────────────┼───────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        │                   │                   │
   ┌────▼─────┐       ┌────▼─────┐      ┌─────▼──────┐
   │  Cache   │       │    DB    │      │ Analytics  │
   │  (Redis) │       │  Master  │      │  Service   │
   └──────────┘       └────┬─────┘      └────────────┘
                           │
                    ┌──────┼──────┐
                    │      │      │
               ┌────▼──┐ ┌─▼───┐ ┌▼────┐
               │ Slave │ │Slave│ │Slave│
               └───────┘ └─────┘ └─────┘

Detailed Design

URL Shortening Flow

  1. Receive Request: User submits long URL
  2. Generate Key: Get unique key from Key Generation Service
  3. Check Collision: Verify key doesn’t exist (rare with good key length)
  4. Store Mapping: Save URL mapping to database
  5. Return Short URL: Respond with shortened URL
class URLShortenerService:
    def __init__(self, db, cache, key_service):
        self.db = db
        self.cache = cache
        self.key_service = key_service
    
    def shorten_url(self, original_url: str, user_id: int = None, 
                    custom_alias: str = None) -> str:
        """Shorten a URL"""
        
        # Check if custom alias is available
        if custom_alias:
            if self.is_key_taken(custom_alias):
                raise ValueError("Custom alias already taken")
            short_key = custom_alias
        else:
            short_key = self.key_service.get_key()
        
        # Store in database
        url_data = {
            'short_key': short_key,
            'original_url': original_url,
            'user_id': user_id,
            'created_at': datetime.now()
        }
        self.db.insert('urls', url_data)
        
        # Cache the mapping
        self.cache.set(short_key, original_url, ttl=3600)
        
        return f"https://short.ly/{short_key}"

URL Redirection Flow

  1. Receive Short URL: User clicks short link
  2. Check Cache: Look up URL in Redis cache
  3. Query Database: If cache miss, fetch from database
  4. Update Cache: Store result in cache
  5. Log Analytics: Async record click data
  6. Redirect: HTTP 301/302 redirect to original URL
def redirect_url(self, short_key: str, request_data: dict) -> str:
    """Redirect to original URL"""
    
    # Try cache first (O(1) lookup)
    original_url = self.cache.get(short_key)
    
    if not original_url:
        # Cache miss - query database
        url_data = self.db.query(
            "SELECT original_url FROM urls WHERE short_key = %s",
            (short_key,)
        )
        
        if not url_data:
            raise ValueError("URL not found")
        
        original_url = url_data['original_url']
        
        # Update cache
        self.cache.set(short_key, original_url, ttl=3600)
    
    # Log analytics asynchronously
    self.log_analytics_async(short_key, request_data)
    
    return original_url

Caching Strategy

Cache-Aside Pattern

  • Cache frequently accessed URLs (80/20 rule)
  • Use LRU eviction policy
  • TTL: 1 hour for normal URLs
  • Pre-warm cache for trending URLs
class CacheManager:
    def __init__(self, redis_client):
        self.redis = redis_client
        self.default_ttl = 3600  # 1 hour
    
    def get(self, key: str) -> Optional[str]:
        """Get URL from cache"""
        return self.redis.get(f"url:{key}")
    
    def set(self, key: str, value: str, ttl: int = None):
        """Set URL in cache"""
        ttl = ttl or self.default_ttl
        self.redis.setex(f"url:{key}", ttl, value)
    
    def delete(self, key: str):
        """Remove URL from cache"""
        self.redis.delete(f"url:{key}")

Scaling Considerations

Database Partitioning

  • Partition Key: Hash of short_key
  • Consistent Hashing: Distribute data across shards
  • Replication: Master-slave for read scalability

Rate Limiting

class RateLimiter:
    def __init__(self, redis_client):
        self.redis = redis_client
    
    def is_allowed(self, api_key: str, limit: int = 100, 
                   window: int = 3600) -> bool:
        """
        Check if request is within rate limit.
        Using sliding window counter.
        """
        current = int(time.time())
        window_start = current - window
        
        # Remove old entries
        self.redis.zremrangebyscore(
            f"rate:{api_key}", 
            0, 
            window_start
        )
        
        # Count requests in current window
        request_count = self.redis.zcard(f"rate:{api_key}")
        
        if request_count < limit:
            # Add current request
            self.redis.zadd(
                f"rate:{api_key}", 
                {current: current}
            )
            self.redis.expire(f"rate:{api_key}", window)
            return True
        
        return False

Security Considerations

  1. API Keys: Authenticate users to prevent abuse
  2. Rate Limiting: Prevent DoS attacks
  3. Input Validation: Sanitize URLs to prevent injection
  4. HTTPS: Encrypt all communications
  5. Captcha: For public endpoints

Trade-offs and Alternatives

301 vs 302 Redirects

  • 301 (Permanent): Cached by browsers, reduces server load but limits analytics
  • 302 (Temporary): Better for analytics, slightly higher load

SQL vs NoSQL

  • SQL: ACID compliance, complex queries for analytics
  • NoSQL: Better horizontal scaling, eventual consistency

Key Generation Strategies

  1. Random Generation: Simple but requires collision checking
  2. Hashing: Deterministic but potential collisions
  3. Counter-based: Sequential, requires distributed counter
  4. Pre-generated Pool: Fast, requires storage and coordination

Summary

Key design decisions:

  • Use Base62 encoding for short keys (7 characters = 3.5T combinations)
  • Pre-generated key pool for performance
  • Redis caching for 80% of redirects
  • Async analytics processing to avoid blocking redirects
  • Database partitioning for horizontal scaling
  • Rate limiting for security

This design supports billions of URLs with high availability and low latency while providing comprehensive analytics capabilities.