Redis

System Architecture

intermediate
35-50 minutes
rediscachein-memorynosqlkey-valuesession-store

In-memory data structure store designed for high-performance caching, sessions, and real-time applications

Redis

Overview

Redis (REmote DIctionary Server) is an open-source, in-memory data structure store designed for high-performance applications requiring sub-millisecond latency. It addresses the critical challenge of providing blazing-fast data access in distributed systems at scale.

Originally developed by Salvatore Sanfilippo in 2009, Redis has become the standard for in-memory caching and session storage at companies like Twitter, GitHub, Instagram, and Stack Overflow. It processes millions of operations per second and is designed for high availability, persistence, and horizontal scaling.

Key capabilities include:

  • Multi-data-structure support: Strings, lists, sets, sorted sets, hashes, streams, and specialized types
  • Sub-millisecond latency: In-memory architecture with optimized data structures
  • Flexible persistence: Configurable durability with RDB snapshots and AOF logging
  • High availability: Master-slave replication, Redis Sentinel, and Redis Cluster
  • Advanced features: Pub/Sub messaging, Lua scripting, transactions, and geospatial operations

Architecture & Core Components

System Architecture

System Architecture Diagram

Core Components

1. Redis Server (redis-server)

  • Single-threaded event loop: Uses epoll/kqueue for I/O multiplexing
  • Memory allocator: jemalloc for efficient memory management
  • Command processor: Parser and executor for Redis protocol (RESP)
  • Persistence engine: Handles RDB and AOF operations
  • Networking layer: TCP/IP and Unix domain socket support

2. Data Structure Engine

  • Object system: Type-specific optimizations and encoding
  • Memory optimization: Ziplist, intset, and other space-efficient encodings
  • Expiration mechanism: Active and passive key expiration
  • LRU/LFU eviction: Configurable eviction policies

3. Replication System

  • Asynchronous replication: Non-blocking master-slave synchronization
  • Partial resynchronization: Efficient catch-up after network partitions
  • Replica-of-replica: Cascading replication for scaling read operations

4. Cluster Management

  • Hash slot distribution: 16384 slots across cluster nodes
  • Gossip protocol: Node discovery and health monitoring
  • Client-side routing: Smart clients cache slot mappings
  • Resharding: Online slot migration for scaling

Data Flow & Processing Model

System Architecture Diagram

Threading Model

  • Main thread: Handles all client commands and I/O
  • Background threads: Persistence operations, key expiration
  • I/O threads (Redis 6+): Optional multithreading for network I/O
  • Module threads: Custom extensions can use threading

Configuration & Deployment

Production Configuration

Basic Production Setup

# redis.conf - Production Configuration
bind 0.0.0.0
port 6379
timeout 300
tcp-keepalive 300

# Memory Configuration
maxmemory 8gb
maxmemory-policy allkeys-lru
maxmemory-samples 5

# Security
requirepass "your-strong-password-here"
rename-command FLUSHDB ""
rename-command FLUSHALL ""
rename-command SHUTDOWN SHUTDOWN_REDIS

# Persistence
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes

appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

High Availability Cluster Setup

# Master Node Configuration
port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 15000
cluster-replica-validity-factor 10
cluster-migration-barrier 1
cluster-require-full-coverage yes

# Replica Configuration
replicaof master-ip 7000
replica-serve-stale-data yes
replica-read-only yes
replica-priority 100

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  redis-master:
    image: redis:7-alpine
    command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis-data:/data
      - ./redis.conf:/usr/local/etc/redis/redis.conf
    ports:
      - "6379:6379"
    environment:
      - REDIS_PASSWORD=${REDIS_PASSWORD}
    
  redis-replica:
    image: redis:7-alpine
    command: redis-server --appendonly yes --replicaof redis-master 6379
    depends_on:
      - redis-master
    environment:
      - REDIS_PASSWORD=${REDIS_PASSWORD}

volumes:
  redis-data:

Kubernetes Deployment

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
spec:
  serviceName: redis-cluster
  replicas: 6
  selector:
    matchLabels:
      app: redis-cluster
  template:
    metadata:
      labels:
        app: redis-cluster
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        command:
          - redis-server
          - /etc/redis/redis.conf
        ports:
        - containerPort: 6379
        - containerPort: 16379
        volumeMounts:
        - name: config
          mountPath: /etc/redis
        - name: data
          mountPath: /data
      volumes:
      - name: config
        configMap:
          name: redis-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

Networking & Security

  • Ports: 6379 (client), 16379 (cluster bus)
  • TLS encryption: Redis 6+ supports TLS for client and cluster communication
  • Firewall rules: Restrict access to Redis ports
  • VPC configuration: Deploy in private subnets
  • ACL system (Redis 6+): Fine-grained access control

Performance Characteristics

Throughput Metrics

  • Single instance: 100K-500K ops/sec depending on operation type
  • Pipeline mode: 1M+ ops/sec for bulk operations
  • Cluster mode: Linear scaling with additional master nodes
  • Memory bandwidth: Limited by available RAM and CPU cache efficiency

Latency Characteristics

Operation Type       | P50    | P95    | P99    | P99.9
---------------------|--------|--------|--------|--------
GET (cache hit)     | <1ms   | <1ms   | 1-2ms  | 2-5ms
SET (no persistence) | <1ms   | 1-2ms  | 2-3ms  | 3-8ms
SET (with AOF)       | 1-2ms  | 2-5ms  | 5-10ms | 10-20ms
ZADD                 | 1-2ms  | 2-3ms  | 3-5ms  | 5-15ms
Complex operations   | 5-10ms | 10-50ms| 50-100ms| 100ms+

Resource Utilization Patterns

Memory Usage

  • Data overhead: 20-40% overhead for Redis objects and metadata
  • Fragmentation: 10-30% typical fragmentation ratio
  • Peak usage: 2x normal usage during BGSAVE operations
  • Optimization: Use appropriate data types and compression

CPU Patterns

  • Normal load: 10-30% single-core utilization
  • Peak operations: Can saturate single core at 500K+ ops/sec
  • Background tasks: Additional CPU for persistence and expiration

Network Utilization

  • Bandwidth: Typically 100MB/s - 1GB/s depending on value sizes
  • Connections: 10K+ concurrent connections per instance
  • Protocol overhead: RESP protocol adds ~20% overhead

Scalability Patterns

  • Vertical scaling: Increase memory and CPU (limited by single-threaded nature)
  • Horizontal scaling: Redis Cluster for automatic sharding
  • Read scaling: Master-slave replication for read replicas
  • Geographic distribution: Cross-region replication with Redis Enterprise

Operational Considerations

Failure Modes & Detection

Memory Exhaustion

Symptoms:

  • Commands failing with "OOM" errors
  • Eviction of keys despite maxmemory settings
  • Slow performance due to swap usage

Detection:

# Monitor memory usage
redis-cli INFO memory | grep used_memory_human
redis-cli INFO memory | grep maxmemory_human

# Check for evictions
redis-cli INFO stats | grep evicted_keys

Mitigation:

  • Implement proper maxmemory policies
  • Monitor memory fragmentation
  • Set up alerts for memory thresholds

Persistence Failures

Symptoms:

  • BGSAVE/BGREWRITEAOF failures
  • Disk space exhaustion
  • Data loss after restart

Detection:

# Check last save status
redis-cli LASTSAVE
redis-cli INFO persistence

# Monitor disk space
df -h /var/lib/redis

Network Partitions

Symptoms:

  • Master-slave synchronization failures
  • Client connection timeouts
  • Split-brain scenarios in Sentinel setups

Detection:

# Check replication status
redis-cli INFO replication

# Monitor connection status
redis-cli CLIENT LIST | wc -l

Disaster Recovery

Backup Strategies

# Automated backup script
#!/bin/bash
BACKUP_DIR="/backups/redis"
DATE=$(date +%Y%m%d_%H%M%S)

# Create RDB backup
redis-cli BGSAVE
sleep 5
cp /var/lib/redis/dump.rdb $BACKUP_DIR/dump_$DATE.rdb

# Backup AOF file
cp /var/lib/redis/appendonly.aof $BACKUP_DIR/aof_$DATE.aof

# Compress and upload to S3
tar -czf $BACKUP_DIR/redis_backup_$DATE.tar.gz $BACKUP_DIR/*_$DATE.*
aws s3 cp $BACKUP_DIR/redis_backup_$DATE.tar.gz s3://my-redis-backups/

Point-in-Time Recovery

# Stop Redis instance
systemctl stop redis

# Restore from backup
cp /backups/redis/dump_20231201_120000.rdb /var/lib/redis/dump.rdb
chown redis:redis /var/lib/redis/dump.rdb

# Start Redis
systemctl start redis

Cross-Region Replication

# Set up cross-region replica
redis-cli REPLICAOF remote-master-ip 6379
redis-cli CONFIG SET masterauth "remote-master-password"

Maintenance Procedures

Zero-Downtime Upgrades

# 1. Upgrade replica first
systemctl stop redis-replica
# Install new version
systemctl start redis-replica

# 2. Failover to upgraded replica
redis-cli -p 26379 SENTINEL FAILOVER mymaster

# 3. Upgrade former master (now replica)
systemctl stop redis-master
# Install new version
systemctl start redis-master

# 4. Failover back if needed
redis-cli -p 26379 SENTINEL FAILOVER mymaster

Memory Optimization

# Analyze memory usage
redis-cli --bigkeys
redis-cli --memkeys

# Optimize data structures
redis-cli MEMORY USAGE keyname
redis-cli MEMORY DOCTOR

Troubleshooting Guide

High Latency Issues

# Enable latency monitoring
redis-cli CONFIG SET latency-monitor-threshold 100

# Check slow log
redis-cli SLOWLOG GET 10

# Monitor latency events
redis-cli LATENCY LATEST

Connection Issues

# Check connection limits
redis-cli CONFIG GET maxclients

# Monitor client connections
redis-cli CLIENT LIST
redis-cli INFO clients

Replication Lag

# Check replication offset
redis-cli INFO replication | grep offset

# Monitor replication delay
redis-cli WAIT 1 5000  # Wait for 1 replica, timeout 5s

Production Best Practices

Configuration Tuning

Performance Optimization

# Linux kernel tuning
echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf
echo 'vm.overcommit_memory = 1' >> /etc/sysctl.conf
echo never > /sys/kernel/mm/transparent_hugepage/enabled

# Redis-specific tuning
redis-cli CONFIG SET timeout 300
redis-cli CONFIG SET tcp-keepalive 300
redis-cli CONFIG SET maxclients 10000

Memory Management

# Configure eviction policy based on use case
# Cache: allkeys-lru or allkeys-lfu
# Database: noeviction
redis-cli CONFIG SET maxmemory-policy allkeys-lru

# Monitor memory fragmentation
redis-cli INFO memory | grep mem_fragmentation_ratio

Security Hardening

Authentication & Authorization

# Redis 6+ ACL system
redis-cli ACL SETUSER cache_user on >password ~cache:* +get +set
redis-cli ACL SETUSER admin_user on >admin_pass ~* +@all

# Traditional password auth
redis-cli CONFIG SET requirepass "strong-password-here"

Network Security

# Bind to specific interfaces
redis-cli CONFIG SET bind "127.0.0.1 10.0.1.100"

# Enable TLS (Redis 6+)
tls-port 6380
tls-cert-file /etc/redis/tls/redis.crt
tls-key-file /etc/redis/tls/redis.key
tls-ca-cert-file /etc/redis/tls/ca.crt

Monitoring Setup

Essential Metrics

# Memory metrics
used_memory_human
mem_fragmentation_ratio
evicted_keys

# Performance metrics
instantaneous_ops_per_sec
keyspace_hits
keyspace_misses

# Replication metrics
connected_slaves
master_repl_offset
repl_backlog_size

Alerting Rules

# Prometheus alerting rules
groups:
- name: redis
  rules:
  - alert: RedisHighMemoryUsage
    expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.9
    for: 5m
    
  - alert: RedisHighLatency
    expr: redis_slowlog_length > 10
    for: 2m
    
  - alert: RedisReplicationLag
    expr: redis_master_repl_offset - redis_slave_repl_offset > 1000
    for: 3m

Capacity Planning

Growth Projections

# Monitor key growth
redis-cli INFO keyspace | grep keys=

# Memory growth tracking
redis-cli INFO memory | grep used_memory_human

# Calculate storage requirements
# Total Memory = (Data Size × 1.3 fragmentation) × 2 (for BGSAVE)

Scaling Triggers

  • Memory usage > 80%
  • CPU utilization > 70%
  • Network bandwidth > 80%
  • Latency P95 > threshold
  • Connection count > 80% of max

Integration Patterns

Application Integration

# Python Redis client with connection pooling
import redis
from redis.sentinel import Sentinel

# Connection pool for single instance
pool = redis.ConnectionPool(
    host='redis-server',
    port=6379,
    password='password',
    max_connections=100,
    socket_keepalive=True,
    socket_keepalive_options={}
)
redis_client = redis.Redis(connection_pool=pool)

# Sentinel for high availability
sentinel = Sentinel([('sentinel1', 26379), ('sentinel2', 26379)])
master = sentinel.master_for('mymaster', password='password')
slave = sentinel.slave_for('mymaster', password='password')

Cache Patterns

# Cache-aside pattern with TTL
def get_user(user_id):
    cache_key = f"user:{user_id}"
    
    # Try cache first
    cached_user = redis_client.get(cache_key)
    if cached_user:
        return json.loads(cached_user)
    
    # Fetch from database
    user = database.get_user(user_id)
    
    # Cache with TTL
    redis_client.setex(
        cache_key, 
        3600,  # 1 hour TTL
        json.dumps(user)
    )
    
    return user

# Write-through pattern
def update_user(user_id, data):
    # Update database
    database.update_user(user_id, data)
    
    # Update cache
    cache_key = f"user:{user_id}"
    redis_client.setex(cache_key, 3600, json.dumps(data))

Interview-Focused Content

Technology-Specific Questions

Junior Level (2-4 YOE)

Q: How does Redis achieve such high performance compared to disk-based databases?

A: Redis achieves high performance through several key design decisions:

  1. In-memory storage: All data is stored in RAM, eliminating disk I/O for read operations
  2. Single-threaded architecture: Avoids context switching and lock contention
  3. Optimized data structures: Uses specialized encodings like ziplist for small collections
  4. Efficient networking: Uses epoll/kqueue for non-blocking I/O multiplexing
  5. Simple protocol: RESP protocol has minimal parsing overhead

Q: What's the difference between RDB and AOF persistence?

A:

  • RDB (Redis Database): Point-in-time snapshots
    • Advantages: Compact, fast restarts, good for backups
    • Disadvantages: Data loss between snapshots, CPU-intensive for large datasets
  • AOF (Append Only File): Logs every write operation
    • Advantages: Better durability, human-readable, automatic rewrite
    • Disadvantages: Larger files, slower restarts

Mid-Level (4-8 YOE)

Q: How would you handle Redis memory optimization in a production environment?

A: Memory optimization involves several strategies:

  1. Data structure optimization: Use hashes for objects, appropriate encodings
  2. TTL management: Set expiration on temporary data
  3. Eviction policies: Configure appropriate maxmemory-policy (allkeys-lru for cache)
  4. Compression: Use Redis compression for large values
  5. Monitoring: Track memory fragmentation and usage patterns
  6. Key naming: Use efficient key naming conventions
  7. Data cleanup: Regular cleanup of expired or unused keys

Q: Explain Redis Cluster and its trade-offs.

A: Redis Cluster provides horizontal scaling through automatic sharding:

Benefits:

  • Automatic data partitioning across nodes
  • High availability with master-replica setup
  • Linear scalability for most operations
  • No single point of failure

Trade-offs:

  • Only supports a subset of Redis commands (no multi-key operations across slots)
  • No support for multiple databases (always uses DB 0)
  • Increased operational complexity
  • Network overhead for cluster communication
  • Resharding requires careful planning

Senior Level (8+ YOE)

Q: Design a Redis-based session store for a microservices architecture handling 1M+ users.

A: For a high-scale session store:

Architecture:

  • Redis Cluster with 6+ nodes (3 masters, 3 replicas)
  • Geographic distribution with cross-region replication
  • Load balancer with consistent hashing for session affinity

Data Model:

Session Key: session:{user_id}:{session_id}
Structure: Hash with fields:
- user_id, roles, permissions, last_activity, preferences
TTL: 30 minutes with sliding window on activity

Implementation:

  • Compression for large session data
  • Atomic operations for session updates
  • Background cleanup for expired sessions
  • Monitoring for memory usage and hit rates

Scaling Considerations:

  • Horizontal scaling through Redis Cluster
  • Read replicas for session validation
  • Circuit breakers for Redis failures
  • Graceful degradation with temporary local storage

Operational Questions

Q: Your Redis cluster is experiencing high latency. Walk me through your debugging process.

A: Systematic debugging approach:

  1. Immediate Assessment:
    • Check SLOWLOG GET 10 for slow operations
    • Monitor INFO stats for operation patterns
    • Check LATENCY LATEST for latency events
  2. System Resources:
    • Memory usage and fragmentation
    • CPU utilization and system load
    • Network bandwidth and connection count
    • Disk I/O (for persistence operations)
  3. Redis Configuration:
    • Persistence settings (disable if unnecessary)
    • Eviction policy appropriateness
    • Client timeout and keepalive settings
  4. Application Analysis:
    • Query patterns and command types
    • Key access patterns (hot keys)
    • Pipeline and batch operation usage
  5. Infrastructure:
    • Network latency between clients and Redis
    • Hardware limitations (CPU, memory, network)
    • Virtual machine noisy neighbor issues

Q: How do you handle a Redis master failure in production?

A: Redis master failure handling depends on setup:

With Redis Sentinel:

  1. Sentinel detects master failure (quorum-based)
  2. Automatic failover promotes a replica to master
  3. Clients automatically discover new master
  4. Failed master rejoins as replica when recovered

Manual Process:

  1. Identify failure through monitoring alerts
  2. Verify replica is up-to-date
  3. Promote replica: REPLICAOF NO ONE
  4. Update application configuration
  5. Monitor for split-brain scenarios

Prevention:

  • Use Redis Sentinel or Redis Cluster
  • Implement proper monitoring and alerting
  • Test failover procedures regularly
  • Use connection pooling with automatic reconnection

Design Integration

Q: How would you integrate Redis into a high-throughput analytics pipeline?

A: Analytics pipeline integration:

Use Cases:

  1. Real-time counters: User activity, page views, API calls
  2. Session storage: User sessions and temporary state
  3. Cache layer: Frequently accessed analytics results
  4. Rate limiting: API throttling and user quotas

Architecture:

Data Sources → Kafka → Stream Processor → Redis + Database
                ↓
            Real-time Dashboard ← Redis (cache)

Implementation:

  • Use Redis Streams for event processing
  • Atomic counters with INCR/INCRBY
  • Time-series data with sorted sets
  • Cache analytical queries with appropriate TTL
  • Use Redis Cluster for horizontal scaling

Considerations:

  • Data durability requirements (persistence settings)
  • Memory management for time-series data
  • Integration with existing analytics tools
  • Monitoring and alerting for pipeline health

Trade-off Analysis

Q: When would you choose Redis over Memcached?

A: Redis vs Memcached decision factors:

Choose Redis when:

  • Need rich data structures (lists, sets, sorted sets)
  • Require persistence capabilities
  • Need pub/sub messaging
  • Want atomic operations and transactions
  • Require master-slave replication
  • Need Lua scripting capabilities

Choose Memcached when:

  • Simple key-value caching is sufficient
  • Multi-threaded performance is critical
  • Memory efficiency is paramount
  • Simpler operational model preferred
  • No persistence requirements

Specific Scenarios:

  • Session storage: Redis (persistence, data structures)
  • Simple page caching: Either (Redis for persistence)
  • High-throughput simple cache: Memcached (multi-threading)
  • Real-time features: Redis (pub/sub, data structures)

Troubleshooting Scenarios

Q: Redis memory usage keeps growing despite setting maxmemory. What could be wrong?

A: Several potential issues:

  1. Eviction Policy Issues:
    • Policy set to noeviction (prevents key removal)
    • Insufficient keys matching eviction criteria
    • TTL not set on keys
  2. Memory Fragmentation:
    • High fragmentation ratio (>1.5)
    • Memory not being returned to OS
    • Need to restart or use MEMORY PURGE
  3. Persistence Operations:
    • BGSAVE creating memory spikes
    • AOF rewrite consuming additional memory
    • Child processes doubling memory usage
  4. Large Objects:
    • Keys exceeding reasonable size limits
    • Unbounded collections (lists, sets)
    • Memory overhead for small objects

Debugging Steps:

# Check memory details
redis-cli INFO memory

# Identify large keys
redis-cli --bigkeys

# Check eviction settings
redis-cli CONFIG GET maxmemory*

# Monitor fragmentation
redis-cli INFO memory | grep fragmentation

Resolution:

  • Adjust eviction policy to allkeys-lru
  • Set TTL on appropriate keys
  • Optimize data structures
  • Schedule BGSAVE during low-traffic periods
  • Consider Redis Cluster for scaling

Further Reading

Official Documentation

Production Guides

Performance & Monitoring

Related Systems

memcached
hazelcast
apache-ignite
amazon-elasticache

Used By

twittergithubinstagramstackoverflowpinterestairbnb