Apache Cassandra

Overview

Apache Cassandra is a distributed NoSQL database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It addresses the critical challenge of data distribution and consistency in distributed systems at massive scale.

Originally developed by Facebook and later open-sourced, Cassandra has become the standard for time-series data, IoT applications, and real-time analytics at companies like Netflix, Instagram, and Spotify. It processes petabytes of data and is designed for linear scalability and eventual consistency.

Key capabilities include:

Linear Scalability: Add nodes to increase capacity without downtime
High Availability: No single point of failure with multi-datacenter support
Tunable Consistency: Choose between strong and eventual consistency per operation
Time-Series Optimization: Efficient storage and querying of time-ordered data

Architecture & Core Components

System Architecture

System Architecture Diagram

Core Components

1. Node

Individual Cassandra server instance
Stores data in SSTables and commit logs
Participates in gossip protocol for cluster membership
Handles read/write operations for assigned token ranges

2. Keyspace

Top-level namespace (similar to database in RDBMS)
Defines replication strategy and factor
Contains multiple column families (tables)

3. Column Family (Table)

Collection of rows with similar structure
Defined by primary key and clustering columns
Stores data in SSTable format

4. Partitioner

Determines which node stores each row
Murmur3Partitioner: Default, evenly distributes data
RandomPartitioner: Legacy, uses MD5 hash
ByteOrderedPartitioner: Preserves sort order

5. Replication Strategy

SimpleStrategy: Single datacenter replication
NetworkTopologyStrategy: Multi-datacenter aware replication

Data Model

System Architecture Diagram

Write Path

System Architecture Diagram

Read Path

System Architecture Diagram

Configuration & Deployment

Production Setup

1. Cluster Configuration

# cassandra.yaml
cluster_name: 'production-cluster'
num_tokens: 256
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2

# Network Configuration
listen_address: 10.0.1.100
rpc_address: 10.0.1.100
rpc_port: 9160
native_transport_port: 9042

# Data Storage
data_file_directories:
    - /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
saved_caches_directory: /var/lib/cassandra/saved_caches

# Memory Configuration
heap_size_in_mb: 8192
new_size_in_mb: 200
max_heap_size_in_mb: 8192
heap_newsize_in_mb: 200

2. Keyspace Creation

-- Create keyspace with replication strategy
CREATE KEYSPACE user_analytics 
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3,
    'datacenter2': 2
};

-- Create table with appropriate partitioning
CREATE TABLE user_analytics.user_events (
    user_id UUID,
    event_time TIMESTAMP,
    event_type TEXT,
    event_data TEXT,
    PRIMARY KEY (user_id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);

3. Multi-Datacenter Setup

# Datacenter 1 - Primary
# cassandra-rackdc.properties
dc=datacenter1
rack=rack1

# Datacenter 2 - Secondary
# cassandra-rackdc.properties
dc=datacenter2
rack=rack1

Resource Requirements

Minimum Requirements

CPU: 4 cores (8+ recommended)
Memory: 8GB RAM (16GB+ recommended)
Storage: SSD with 100GB+ free space
Network: 1Gbps (10Gbps for high-throughput)

Production Recommendations

CPU: 16+ cores for high-throughput workloads
Memory: 32GB+ RAM (50% for heap, 50% for OS cache)
Storage: NVMe SSD with 1TB+ capacity
Network: 10Gbps+ with low latency

Performance Characteristics

Throughput Metrics

Write Performance

Single Node: 10,000-50,000 writes/second
Cluster (10 nodes): 100,000-500,000 writes/second
Large Cluster (100+ nodes): 1M+ writes/second
Bulk Loading: 100MB-1GB/second per node

Read Performance

Single Node: 5,000-25,000 reads/second
Cluster (10 nodes): 50,000-250,000 reads/second
Point Queries: 1-5ms latency
Range Queries: 10-100ms latency

Latency Characteristics

Write Latency

P50: 1-3ms
P95: 5-15ms
P99: 20-100ms
P99.9: 100-500ms

Read Latency

P50: 1-5ms
P95: 10-50ms
P99: 50-200ms
P99.9: 200ms-1s

Scalability Patterns

Horizontal Scaling

Linear Scaling: Add nodes to increase capacity
No Downtime: Add/remove nodes without service interruption
Auto-Rebalancing: Automatic data redistribution
Multi-Datacenter: Global distribution support

Vertical Scaling

Memory: Increase heap size for larger caches
CPU: More cores for higher concurrency
Storage: Larger disks for more data retention
Network: Higher bandwidth for better throughput

Operational Considerations

Failure Modes

Node Failures

Single Node: Automatic failover, no data loss
Multiple Nodes: Depends on replication factor
Datacenter Failure: Cross-datacenter replication required
Network Partition: Split-brain scenarios possible

Data Corruption

SSTable Corruption: Automatic repair mechanisms
Commit Log Corruption: Data loss possible
Replication Inconsistency: Anti-entropy repair needed

Disaster Recovery

Backup Strategies

# Full backup
nodetool snapshot -t backup_$(date +%Y%m%d) keyspace_name

# Incremental backup
# Enable in cassandra.yaml
incremental_backups: true

# Restore from backup
# Stop node, restore SSTables, restart

Multi-Datacenter Recovery

Cross-Datacenter Replication: Automatic failover
Read Repair: Consistency maintenance
Hinted Handoff: Temporary failure handling
Anti-Entropy Repair: Long-term consistency

Maintenance Procedures

Node Addition

# 1. Start new node with auto_bootstrap: true
# 2. Monitor streaming progress
nodetool netstats

# 3. Verify cluster health
nodetool status
nodetool ring

Node Removal

# 1. Decommission node
nodetool decommission

# 2. Wait for streaming to complete
nodetool netstats

# 3. Stop and remove node

Schema Changes

-- Add column (non-breaking)
ALTER TABLE user_analytics.user_events 
ADD new_column TEXT;

-- Modify column (requires careful planning)
-- Drop and recreate with new type

Production Best Practices

Configuration Tuning

Memory Optimization

# Heap size: 50% of total RAM
heap_size_in_mb: 16384

# New generation: 25% of heap
new_size_in_mb: 4096

# Enable compressed OOPs
-XX:+UseCompressedOops

Write Optimization

# Batch size optimization
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50

# Commit log optimization
commitlog_segment_size_in_mb: 32
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

Read Optimization

# Row cache for hot data
row_cache_size_in_mb: 0
row_cache_save_period: 0

# Key cache for partition keys
key_cache_size_in_mb: 100
key_cache_save_period: 14400

Security Configuration

Authentication & Authorization

# Enable authentication
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer

# Create users
CREATE USER admin WITH PASSWORD 'secure_password' SUPERUSER;
CREATE USER app_user WITH PASSWORD 'app_password' NOSUPERUSER;

Encryption

# Client-to-node encryption
client_encryption_options:
    enabled: true
    keystore: /etc/cassandra/keystore.jks
    keystore_password: keystore_password

# Node-to-node encryption
server_encryption_options:
    internode_encryption: all
    keystore: /etc/cassandra/keystore.jks
    keystore_password: keystore_password

Monitoring & Alerting

Essential Metrics

# Cluster health
nodetool status
nodetool ring
nodetool info

# Performance metrics
nodetool tpstats
nodetool cfstats
nodetool proxyhistograms

Key Performance Indicators

Read/Write Latency: P95, P99 percentiles
Throughput: Operations per second
Disk Usage: Space utilization per node
Memory Usage: Heap and off-heap utilization
Network: Bandwidth and connection counts

Alerting Thresholds

High Latency: P95 > 100ms
Low Throughput: < 50% of baseline
Disk Space: > 80% utilization
Memory Pressure: > 85% heap usage
Node Down: Any node unavailable

Interview-Focused Content

Technology-Specific Questions

Architecture Deep-Dive

Q: Explain Cassandra's write path and how it ensures durability.

A: Cassandra's write path involves multiple steps:

Client Request: Application sends write to coordinator node
Replica Selection: Coordinator determines replicas based on partitioner and replication strategy
Commit Log Write: All writes go to commit log first for durability
Memtable Update: Data written to in-memory structure (memtable)
Replication: Write sent to all replica nodes
Acknowledgment: Coordinator waits for required number of acknowledgments based on consistency level

The commit log ensures durability by persisting writes before acknowledgment, allowing recovery from crashes.

Consistency Models

Q: How does Cassandra handle consistency and what are the trade-offs?

A: Cassandra offers tunable consistency with these levels:

ONE: Single replica acknowledgment (fastest, least consistent)
QUORUM: Majority of replicas (balanced)
ALL: All replicas (slowest, most consistent)
LOCAL_QUORUM: Majority in local datacenter
EACH_QUORUM: Quorum in each datacenter

Trade-offs:

Speed vs Consistency: Lower consistency = faster operations
Availability vs Consistency: CAP theorem implications
Network Partitions: Higher consistency levels more susceptible to failures

Operational Questions

Scaling Scenarios

Q: How would you scale a Cassandra cluster from 10 to 100 nodes?

A: Scaling approach:

Capacity Planning: Calculate data distribution and token ranges
Gradual Addition: Add nodes in batches (5-10 at a time)
Monitor Streaming: Watch data movement and cluster health
Load Testing: Verify performance after each batch
Schema Optimization: Adjust replication factors if needed
Network Planning: Ensure sufficient bandwidth for streaming

Key considerations:

Token Assignment: Use vnodes for even distribution
Streaming Throttle: Control network usage during rebalancing
Read Repair: Monitor consistency during scaling

Troubleshooting Scenarios

Q: Your Cassandra cluster is experiencing high read latency. How do you investigate?

A: Investigation approach:

Identify Affected Nodes: Check nodetool proxyhistograms
Analyze Query Patterns: Look for expensive queries or hotspots
Check Resource Usage: CPU, memory, disk I/O
Review Configuration: Cache settings, compaction strategy
Monitor Compaction: Check for compaction backlog
Network Analysis: Look for network bottlenecks

Common causes:

Hot Partitions: Uneven data distribution
Large Partitions: Too much data per partition
Compaction Pressure: SSTable count too high
Cache Misses: Insufficient row/key cache
Network Issues: High latency between nodes

Design Integration

System Architecture

Q: How would you design a real-time analytics system using Cassandra?

A: Architecture components:

Data Ingestion: Kafka for high-throughput event streaming
Storage Layer: Cassandra for time-series data storage
Query Layer: Application servers with connection pooling
Caching Layer: Redis for hot data caching
Analytics Layer: Spark for batch processing

Data model:

-- Time-series events table
CREATE TABLE analytics.events (
    event_type TEXT,
    timestamp TIMESTAMP,
    user_id UUID,
    event_data TEXT,
    PRIMARY KEY (event_type, timestamp, user_id)
) WITH CLUSTERING ORDER BY (timestamp DESC);

Considerations:

Partition Design: Avoid hot partitions
TTL Strategy: Automatic data expiration
Compaction: TimeWindowCompactionStrategy for time-series
Read Patterns: Optimize for time-range queries

Multi-Datacenter Design

Q: How would you design Cassandra for global deployment?

A: Multi-datacenter strategy:

Geographic Distribution: Deploy in multiple regions
Replication Strategy: NetworkTopologyStrategy with regional replication
Consistency Levels: Use LOCAL_QUORUM for low latency
Data Locality: Route reads to local datacenter
Disaster Recovery: Cross-datacenter replication for RTO/RPO

Configuration:

CREATE KEYSPACE global_app 
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'us-east': 3,
    'us-west': 3,
    'eu-west': 3,
    'asia-pacific': 3
};

Trade-off Analysis

Cassandra vs Alternatives

Q: When would you choose Cassandra over DynamoDB or MongoDB?

A: Choose Cassandra when:

Scale Requirements: Need to handle petabytes of data
Multi-Datacenter: Global deployment with local reads
Time-Series Data: Optimized for time-ordered data
Cost Control: Want to avoid vendor lock-in
Custom Tuning: Need fine-grained control over performance

Choose DynamoDB when:

Managed Service: Want fully managed solution
Predictable Workloads: Consistent access patterns
AWS Ecosystem: Already using AWS services
Auto-Scaling: Need automatic scaling without planning

Choose MongoDB when:

Complex Queries: Need rich query capabilities
Document Model: Natural fit for document data
ACID Transactions: Need strong consistency guarantees
Developer Experience: Prefer JSON-like data model

Real-World Scenarios

Production Case Studies

Netflix: Global Content Delivery

Scale: 100+ billion events per day
Architecture: Multi-datacenter Cassandra deployment
Use Case: User viewing history, recommendations
Challenges: Global consistency, low latency requirements
Solutions: Eventual consistency, local reads, anti-entropy repair

Instagram: User Activity Tracking

Scale: 500+ million users, billions of interactions
Architecture: Time-series optimized Cassandra
Use Case: User feeds, activity streams
Challenges: Hot partitions, write amplification
Solutions: Partition design, compaction tuning, caching

Failure Stories & Lessons

Configuration Mistakes

Scenario: Production cluster experiencing data loss Root Cause: Incorrect replication factor configuration Impact: 2 hours of data loss, service degradation Lesson: Always validate replication strategy before production deployment

Scaling Issues

Scenario: Cluster performance degradation after adding nodes Root Cause: Insufficient network bandwidth for streaming Impact: 50% performance drop during rebalancing Lesson: Plan network capacity for scaling operations

Memory Pressure

Scenario: Cluster nodes crashing due to out-of-memory errors Root Cause: Heap size too large, insufficient OS cache Impact: Service unavailability, data inconsistency Lesson: Balance heap size with OS cache requirements

This comprehensive guide covers Cassandra from basic concepts to advanced production deployment, providing the depth needed for both technical interviews and real-world system design scenarios.

Apache Cassandra

Overview

Architecture & Core Components

System Architecture

System Architecture Diagram

Core Components

1. Node

2. Keyspace

3. Column Family (Table)

4. Partitioner

5. Replication Strategy

Data Model

System Architecture Diagram

Write Path

System Architecture Diagram

Read Path

System Architecture Diagram

Configuration & Deployment

Production Setup

1. Cluster Configuration

2. Keyspace Creation

3. Multi-Datacenter Setup

Resource Requirements

Minimum Requirements

Production Recommendations

Performance Characteristics

Throughput Metrics

Write Performance

Read Performance

Latency Characteristics

Write Latency

Read Latency

Scalability Patterns

Horizontal Scaling

Vertical Scaling

Operational Considerations

Failure Modes

Node Failures

Data Corruption

Disaster Recovery

Backup Strategies

Multi-Datacenter Recovery

Maintenance Procedures

Node Addition

Node Removal

Schema Changes

Production Best Practices

Configuration Tuning

Memory Optimization

Write Optimization

Read Optimization

Security Configuration

Authentication & Authorization

Encryption

Monitoring & Alerting

Essential Metrics

Key Performance Indicators

Alerting Thresholds

Interview-Focused Content

Technology-Specific Questions

Architecture Deep-Dive

Consistency Models

Operational Questions

Scaling Scenarios

Troubleshooting Scenarios

Design Integration

System Architecture

Multi-Datacenter Design

Trade-off Analysis

Cassandra vs Alternatives

Real-World Scenarios

Production Case Studies

Netflix: Global Content Delivery

Instagram: User Activity Tracking

Failure Stories & Lessons

Configuration Mistakes

Scaling Issues

Memory Pressure

Contents

Related Systems