Redis

Overview

Redis (REmote DIctionary Server) is an open-source, in-memory data structure store designed for high-performance applications requiring sub-millisecond latency. It addresses the critical challenge of providing blazing-fast data access in distributed systems at scale.

Originally developed by Salvatore Sanfilippo in 2009, Redis has become the standard for in-memory caching and session storage at companies like Twitter, GitHub, Instagram, and Stack Overflow. It processes millions of operations per second and is designed for high availability, persistence, and horizontal scaling.

Key capabilities include:

Multi-data-structure support: Strings, lists, sets, sorted sets, hashes, streams, and specialized types
Sub-millisecond latency: In-memory architecture with optimized data structures
Flexible persistence: Configurable durability with RDB snapshots and AOF logging
High availability: Master-slave replication, Redis Sentinel, and Redis Cluster
Advanced features: Pub/Sub messaging, Lua scripting, transactions, and geospatial operations

Architecture & Core Components

System Architecture

System Architecture Diagram

Core Components

1. Redis Server (redis-server)

Single-threaded event loop: Uses epoll/kqueue for I/O multiplexing
Memory allocator: jemalloc for efficient memory management
Command processor: Parser and executor for Redis protocol (RESP)
Persistence engine: Handles RDB and AOF operations
Networking layer: TCP/IP and Unix domain socket support

2. Data Structure Engine

Object system: Type-specific optimizations and encoding
Memory optimization: Ziplist, intset, and other space-efficient encodings
Expiration mechanism: Active and passive key expiration
LRU/LFU eviction: Configurable eviction policies

3. Replication System

Asynchronous replication: Non-blocking master-slave synchronization
Partial resynchronization: Efficient catch-up after network partitions
Replica-of-replica: Cascading replication for scaling read operations

4. Cluster Management

Hash slot distribution: 16384 slots across cluster nodes
Gossip protocol: Node discovery and health monitoring
Client-side routing: Smart clients cache slot mappings
Resharding: Online slot migration for scaling

Data Flow & Processing Model

System Architecture Diagram

Threading Model

Main thread: Handles all client commands and I/O
Background threads: Persistence operations, key expiration
I/O threads (Redis 6+): Optional multithreading for network I/O
Module threads: Custom extensions can use threading

Configuration & Deployment

Production Configuration

Basic Production Setup

# redis.conf - Production Configuration
bind 0.0.0.0
port 6379
timeout 300
tcp-keepalive 300

# Memory Configuration
maxmemory 8gb
maxmemory-policy allkeys-lru
maxmemory-samples 5

# Security
requirepass "your-strong-password-here"
rename-command FLUSHDB ""
rename-command FLUSHALL ""
rename-command SHUTDOWN SHUTDOWN_REDIS

# Persistence
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes

appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

High Availability Cluster Setup

# Master Node Configuration
port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 15000
cluster-replica-validity-factor 10
cluster-migration-barrier 1
cluster-require-full-coverage yes

# Replica Configuration
replicaof master-ip 7000
replica-serve-stale-data yes
replica-read-only yes
replica-priority 100

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  redis-master:
    image: redis:7-alpine
    command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis-data:/data
      - ./redis.conf:/usr/local/etc/redis/redis.conf
    ports:
      - "6379:6379"
    environment:
      - REDIS_PASSWORD=${REDIS_PASSWORD}
    
  redis-replica:
    image: redis:7-alpine
    command: redis-server --appendonly yes --replicaof redis-master 6379
    depends_on:
      - redis-master
    environment:
      - REDIS_PASSWORD=${REDIS_PASSWORD}

volumes:
  redis-data:

Kubernetes Deployment

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
spec:
  serviceName: redis-cluster
  replicas: 6
  selector:
    matchLabels:
      app: redis-cluster
  template:
    metadata:
      labels:
        app: redis-cluster
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        command:
          - redis-server
          - /etc/redis/redis.conf
        ports:
        - containerPort: 6379
        - containerPort: 16379
        volumeMounts:
        - name: config
          mountPath: /etc/redis
        - name: data
          mountPath: /data
      volumes:
      - name: config
        configMap:
          name: redis-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

Networking & Security

Ports: 6379 (client), 16379 (cluster bus)
TLS encryption: Redis 6+ supports TLS for client and cluster communication
Firewall rules: Restrict access to Redis ports
VPC configuration: Deploy in private subnets
ACL system (Redis 6+): Fine-grained access control

Performance Characteristics

Throughput Metrics

Single instance: 100K-500K ops/sec depending on operation type
Pipeline mode: 1M+ ops/sec for bulk operations
Cluster mode: Linear scaling with additional master nodes
Memory bandwidth: Limited by available RAM and CPU cache efficiency

Latency Characteristics

Operation Type       | P50    | P95    | P99    | P99.9
---------------------|--------|--------|--------|--------
GET (cache hit)     | <1ms   | <1ms   | 1-2ms  | 2-5ms
SET (no persistence) | <1ms   | 1-2ms  | 2-3ms  | 3-8ms
SET (with AOF)       | 1-2ms  | 2-5ms  | 5-10ms | 10-20ms
ZADD                 | 1-2ms  | 2-3ms  | 3-5ms  | 5-15ms
Complex operations   | 5-10ms | 10-50ms| 50-100ms| 100ms+

Resource Utilization Patterns

Memory Usage

Data overhead: 20-40% overhead for Redis objects and metadata
Fragmentation: 10-30% typical fragmentation ratio
Peak usage: 2x normal usage during BGSAVE operations
Optimization: Use appropriate data types and compression

CPU Patterns

Normal load: 10-30% single-core utilization
Peak operations: Can saturate single core at 500K+ ops/sec
Background tasks: Additional CPU for persistence and expiration

Network Utilization

Bandwidth: Typically 100MB/s - 1GB/s depending on value sizes
Connections: 10K+ concurrent connections per instance
Protocol overhead: RESP protocol adds ~20% overhead

Scalability Patterns

Vertical scaling: Increase memory and CPU (limited by single-threaded nature)
Horizontal scaling: Redis Cluster for automatic sharding
Read scaling: Master-slave replication for read replicas
Geographic distribution: Cross-region replication with Redis Enterprise

Operational Considerations

Failure Modes & Detection

Memory Exhaustion

Symptoms:

Commands failing with "OOM" errors
Eviction of keys despite maxmemory settings
Slow performance due to swap usage

Detection:

# Monitor memory usage
redis-cli INFO memory | grep used_memory_human
redis-cli INFO memory | grep maxmemory_human

# Check for evictions
redis-cli INFO stats | grep evicted_keys

Mitigation:

Implement proper maxmemory policies
Monitor memory fragmentation
Set up alerts for memory thresholds

Persistence Failures

Symptoms:

BGSAVE/BGREWRITEAOF failures
Disk space exhaustion
Data loss after restart

Detection:

# Check last save status
redis-cli LASTSAVE
redis-cli INFO persistence

# Monitor disk space
df -h /var/lib/redis

Network Partitions

Symptoms:

Master-slave synchronization failures
Client connection timeouts
Split-brain scenarios in Sentinel setups

Detection:

# Check replication status
redis-cli INFO replication

# Monitor connection status
redis-cli CLIENT LIST | wc -l

Disaster Recovery

Backup Strategies

# Automated backup script
#!/bin/bash
BACKUP_DIR="/backups/redis"
DATE=$(date +%Y%m%d_%H%M%S)

# Create RDB backup
redis-cli BGSAVE
sleep 5
cp /var/lib/redis/dump.rdb $BACKUP_DIR/dump_$DATE.rdb

# Backup AOF file
cp /var/lib/redis/appendonly.aof $BACKUP_DIR/aof_$DATE.aof

# Compress and upload to S3
tar -czf $BACKUP_DIR/redis_backup_$DATE.tar.gz $BACKUP_DIR/*_$DATE.*
aws s3 cp $BACKUP_DIR/redis_backup_$DATE.tar.gz s3://my-redis-backups/

Point-in-Time Recovery

# Stop Redis instance
systemctl stop redis

# Restore from backup
cp /backups/redis/dump_20231201_120000.rdb /var/lib/redis/dump.rdb
chown redis:redis /var/lib/redis/dump.rdb

# Start Redis
systemctl start redis

Cross-Region Replication

# Set up cross-region replica
redis-cli REPLICAOF remote-master-ip 6379
redis-cli CONFIG SET masterauth "remote-master-password"

Maintenance Procedures

Zero-Downtime Upgrades

# 1. Upgrade replica first
systemctl stop redis-replica
# Install new version
systemctl start redis-replica

# 2. Failover to upgraded replica
redis-cli -p 26379 SENTINEL FAILOVER mymaster

# 3. Upgrade former master (now replica)
systemctl stop redis-master
# Install new version
systemctl start redis-master

# 4. Failover back if needed
redis-cli -p 26379 SENTINEL FAILOVER mymaster

Memory Optimization

# Analyze memory usage
redis-cli --bigkeys
redis-cli --memkeys

# Optimize data structures
redis-cli MEMORY USAGE keyname
redis-cli MEMORY DOCTOR

Troubleshooting Guide

High Latency Issues

# Enable latency monitoring
redis-cli CONFIG SET latency-monitor-threshold 100

# Check slow log
redis-cli SLOWLOG GET 10

# Monitor latency events
redis-cli LATENCY LATEST

Connection Issues

# Check connection limits
redis-cli CONFIG GET maxclients

# Monitor client connections
redis-cli CLIENT LIST
redis-cli INFO clients

Replication Lag

# Check replication offset
redis-cli INFO replication | grep offset

# Monitor replication delay
redis-cli WAIT 1 5000  # Wait for 1 replica, timeout 5s

Production Best Practices

Configuration Tuning

Performance Optimization

# Linux kernel tuning
echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf
echo 'vm.overcommit_memory = 1' >> /etc/sysctl.conf
echo never > /sys/kernel/mm/transparent_hugepage/enabled

# Redis-specific tuning
redis-cli CONFIG SET timeout 300
redis-cli CONFIG SET tcp-keepalive 300
redis-cli CONFIG SET maxclients 10000

Memory Management

# Configure eviction policy based on use case
# Cache: allkeys-lru or allkeys-lfu
# Database: noeviction
redis-cli CONFIG SET maxmemory-policy allkeys-lru

# Monitor memory fragmentation
redis-cli INFO memory | grep mem_fragmentation_ratio

Security Hardening

Authentication & Authorization

# Redis 6+ ACL system
redis-cli ACL SETUSER cache_user on >password ~cache:* +get +set
redis-cli ACL SETUSER admin_user on >admin_pass ~* +@all

# Traditional password auth
redis-cli CONFIG SET requirepass "strong-password-here"

Network Security

# Bind to specific interfaces
redis-cli CONFIG SET bind "127.0.0.1 10.0.1.100"

# Enable TLS (Redis 6+)
tls-port 6380
tls-cert-file /etc/redis/tls/redis.crt
tls-key-file /etc/redis/tls/redis.key
tls-ca-cert-file /etc/redis/tls/ca.crt

Monitoring Setup

Essential Metrics

# Memory metrics
used_memory_human
mem_fragmentation_ratio
evicted_keys

# Performance metrics
instantaneous_ops_per_sec
keyspace_hits
keyspace_misses

# Replication metrics
connected_slaves
master_repl_offset
repl_backlog_size

Alerting Rules

# Prometheus alerting rules
groups:
- name: redis
  rules:
  - alert: RedisHighMemoryUsage
    expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.9
    for: 5m
    
  - alert: RedisHighLatency
    expr: redis_slowlog_length > 10
    for: 2m
    
  - alert: RedisReplicationLag
    expr: redis_master_repl_offset - redis_slave_repl_offset > 1000
    for: 3m

Capacity Planning

Growth Projections

# Monitor key growth
redis-cli INFO keyspace | grep keys=

# Memory growth tracking
redis-cli INFO memory | grep used_memory_human

# Calculate storage requirements
# Total Memory = (Data Size × 1.3 fragmentation) × 2 (for BGSAVE)

Scaling Triggers

Memory usage > 80%
CPU utilization > 70%
Network bandwidth > 80%
Latency P95 > threshold
Connection count > 80% of max

Integration Patterns

Application Integration

# Python Redis client with connection pooling
import redis
from redis.sentinel import Sentinel

# Connection pool for single instance
pool = redis.ConnectionPool(
    host='redis-server',
    port=6379,
    password='password',
    max_connections=100,
    socket_keepalive=True,
    socket_keepalive_options={}
)
redis_client = redis.Redis(connection_pool=pool)

# Sentinel for high availability
sentinel = Sentinel([('sentinel1', 26379), ('sentinel2', 26379)])
master = sentinel.master_for('mymaster', password='password')
slave = sentinel.slave_for('mymaster', password='password')

Cache Patterns

# Cache-aside pattern with TTL
def get_user(user_id):
    cache_key = f"user:{user_id}"
    
    # Try cache first
    cached_user = redis_client.get(cache_key)
    if cached_user:
        return json.loads(cached_user)
    
    # Fetch from database
    user = database.get_user(user_id)
    
    # Cache with TTL
    redis_client.setex(
        cache_key, 
        3600,  # 1 hour TTL
        json.dumps(user)
    )
    
    return user

# Write-through pattern
def update_user(user_id, data):
    # Update database
    database.update_user(user_id, data)
    
    # Update cache
    cache_key = f"user:{user_id}"
    redis_client.setex(cache_key, 3600, json.dumps(data))

Interview-Focused Content

Technology-Specific Questions

Junior Level (2-4 YOE)

Q: How does Redis achieve such high performance compared to disk-based databases?

A: Redis achieves high performance through several key design decisions:

In-memory storage: All data is stored in RAM, eliminating disk I/O for read operations
Single-threaded architecture: Avoids context switching and lock contention
Optimized data structures: Uses specialized encodings like ziplist for small collections
Efficient networking: Uses epoll/kqueue for non-blocking I/O multiplexing
Simple protocol: RESP protocol has minimal parsing overhead

Q: What's the difference between RDB and AOF persistence?

RDB (Redis Database): Point-in-time snapshots
- Advantages: Compact, fast restarts, good for backups
- Disadvantages: Data loss between snapshots, CPU-intensive for large datasets
AOF (Append Only File): Logs every write operation
- Advantages: Better durability, human-readable, automatic rewrite
- Disadvantages: Larger files, slower restarts

Mid-Level (4-8 YOE)

Q: How would you handle Redis memory optimization in a production environment?

A: Memory optimization involves several strategies:

Data structure optimization: Use hashes for objects, appropriate encodings
TTL management: Set expiration on temporary data
Eviction policies: Configure appropriate maxmemory-policy (allkeys-lru for cache)
Compression: Use Redis compression for large values
Monitoring: Track memory fragmentation and usage patterns
Key naming: Use efficient key naming conventions
Data cleanup: Regular cleanup of expired or unused keys

Q: Explain Redis Cluster and its trade-offs.

A: Redis Cluster provides horizontal scaling through automatic sharding:

Benefits:

Automatic data partitioning across nodes
High availability with master-replica setup
Linear scalability for most operations
No single point of failure

Trade-offs:

Only supports a subset of Redis commands (no multi-key operations across slots)
No support for multiple databases (always uses DB 0)
Increased operational complexity
Network overhead for cluster communication
Resharding requires careful planning

Senior Level (8+ YOE)

Q: Design a Redis-based session store for a microservices architecture handling 1M+ users.

A: For a high-scale session store:

Architecture:

Redis Cluster with 6+ nodes (3 masters, 3 replicas)
Geographic distribution with cross-region replication
Load balancer with consistent hashing for session affinity

Data Model:

Session Key: session:{user_id}:{session_id}
Structure: Hash with fields:
- user_id, roles, permissions, last_activity, preferences
TTL: 30 minutes with sliding window on activity

Implementation:

Compression for large session data
Atomic operations for session updates
Background cleanup for expired sessions
Monitoring for memory usage and hit rates

Scaling Considerations:

Horizontal scaling through Redis Cluster
Read replicas for session validation
Circuit breakers for Redis failures
Graceful degradation with temporary local storage

Operational Questions

Q: Your Redis cluster is experiencing high latency. Walk me through your debugging process.

A: Systematic debugging approach:

Immediate Assessment:
- Check SLOWLOG GET 10 for slow operations
- Monitor INFO stats for operation patterns
- Check LATENCY LATEST for latency events
System Resources:
- Memory usage and fragmentation
- CPU utilization and system load
- Network bandwidth and connection count
- Disk I/O (for persistence operations)
Redis Configuration:
- Persistence settings (disable if unnecessary)
- Eviction policy appropriateness
- Client timeout and keepalive settings
Application Analysis:
- Query patterns and command types
- Key access patterns (hot keys)
- Pipeline and batch operation usage
Infrastructure:
- Network latency between clients and Redis
- Hardware limitations (CPU, memory, network)
- Virtual machine noisy neighbor issues

Q: How do you handle a Redis master failure in production?

A: Redis master failure handling depends on setup:

With Redis Sentinel:

Sentinel detects master failure (quorum-based)
Automatic failover promotes a replica to master
Clients automatically discover new master
Failed master rejoins as replica when recovered

Manual Process:

Identify failure through monitoring alerts
Verify replica is up-to-date
Promote replica: REPLICAOF NO ONE
Update application configuration
Monitor for split-brain scenarios

Prevention:

Use Redis Sentinel or Redis Cluster
Implement proper monitoring and alerting
Test failover procedures regularly
Use connection pooling with automatic reconnection

Design Integration

Q: How would you integrate Redis into a high-throughput analytics pipeline?

A: Analytics pipeline integration:

Use Cases:

Real-time counters: User activity, page views, API calls
Session storage: User sessions and temporary state
Cache layer: Frequently accessed analytics results
Rate limiting: API throttling and user quotas

Architecture:

Data Sources → Kafka → Stream Processor → Redis + Database
                ↓
            Real-time Dashboard ← Redis (cache)

Implementation:

Use Redis Streams for event processing
Atomic counters with INCR/INCRBY
Time-series data with sorted sets
Cache analytical queries with appropriate TTL
Use Redis Cluster for horizontal scaling

Considerations:

Data durability requirements (persistence settings)
Memory management for time-series data
Integration with existing analytics tools
Monitoring and alerting for pipeline health

Trade-off Analysis

Q: When would you choose Redis over Memcached?

A: Redis vs Memcached decision factors:

Choose Redis when:

Need rich data structures (lists, sets, sorted sets)
Require persistence capabilities
Need pub/sub messaging
Want atomic operations and transactions
Require master-slave replication
Need Lua scripting capabilities

Choose Memcached when:

Simple key-value caching is sufficient
Multi-threaded performance is critical
Memory efficiency is paramount
Simpler operational model preferred
No persistence requirements

Specific Scenarios:

Session storage: Redis (persistence, data structures)
Simple page caching: Either (Redis for persistence)
High-throughput simple cache: Memcached (multi-threading)
Real-time features: Redis (pub/sub, data structures)

Troubleshooting Scenarios

Q: Redis memory usage keeps growing despite setting maxmemory. What could be wrong?

A: Several potential issues:

Eviction Policy Issues:
- Policy set to noeviction (prevents key removal)
- Insufficient keys matching eviction criteria
- TTL not set on keys
Memory Fragmentation:
- High fragmentation ratio (>1.5)
- Memory not being returned to OS
- Need to restart or use MEMORY PURGE
Persistence Operations:
- BGSAVE creating memory spikes
- AOF rewrite consuming additional memory
- Child processes doubling memory usage
Large Objects:
- Keys exceeding reasonable size limits
- Unbounded collections (lists, sets)
- Memory overhead for small objects

Debugging Steps:

# Check memory details
redis-cli INFO memory

# Identify large keys
redis-cli --bigkeys

# Check eviction settings
redis-cli CONFIG GET maxmemory*

# Monitor fragmentation
redis-cli INFO memory | grep fragmentation

Resolution:

Adjust eviction policy to allkeys-lru
Set TTL on appropriate keys
Optimize data structures
Schedule BGSAVE during low-traffic periods
Consider Redis Cluster for scaling

Redis

Redis

Overview

Architecture & Core Components

System Architecture

System Architecture Diagram

Core Components

1. Redis Server (redis-server)

2. Data Structure Engine

3. Replication System

4. Cluster Management

Data Flow & Processing Model

System Architecture Diagram

Threading Model

Configuration & Deployment

Production Configuration

Basic Production Setup

High Availability Cluster Setup

Docker Deployment

Kubernetes Deployment

Networking & Security

Performance Characteristics

Throughput Metrics

Latency Characteristics

Resource Utilization Patterns

Memory Usage

CPU Patterns

Network Utilization

Scalability Patterns

Operational Considerations

Failure Modes & Detection

Memory Exhaustion

Persistence Failures

Network Partitions

Disaster Recovery

Backup Strategies

Point-in-Time Recovery

Cross-Region Replication

Maintenance Procedures

Zero-Downtime Upgrades

Memory Optimization

Troubleshooting Guide

High Latency Issues

Connection Issues

Replication Lag

Production Best Practices

Configuration Tuning

Performance Optimization

Memory Management

Security Hardening

Authentication & Authorization

Network Security

Monitoring Setup

Essential Metrics

Alerting Rules

Capacity Planning

Growth Projections

Scaling Triggers

Integration Patterns

Application Integration

Cache Patterns

Interview-Focused Content

Technology-Specific Questions

Junior Level (2-4 YOE)

Mid-Level (4-8 YOE)

Senior Level (8+ YOE)

Operational Questions

Design Integration

Trade-off Analysis

Troubleshooting Scenarios

Further Reading

Official Documentation

Production Guides

Performance & Monitoring

Contents

Related Systems

Used By