Apache Cassandra
System Architecture
Distributed NoSQL database designed for high availability, linear scalability, and handling massive amounts of data across multiple data centers
Overview
Apache Cassandra is a distributed NoSQL database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It addresses the critical challenge of data distribution and consistency in distributed systems at massive scale.
Originally developed by Facebook and later open-sourced, Cassandra has become the standard for time-series data, IoT applications, and real-time analytics at companies like Netflix, Instagram, and Spotify. It processes petabytes of data and is designed for linear scalability and eventual consistency.
Key capabilities include:
- Linear Scalability: Add nodes to increase capacity without downtime
- High Availability: No single point of failure with multi-datacenter support
- Tunable Consistency: Choose between strong and eventual consistency per operation
- Time-Series Optimization: Efficient storage and querying of time-ordered data
Architecture & Core Components
System Architecture
System Architecture Diagram
Core Components
1. Node
- Individual Cassandra server instance
- Stores data in SSTables and commit logs
- Participates in gossip protocol for cluster membership
- Handles read/write operations for assigned token ranges
2. Keyspace
- Top-level namespace (similar to database in RDBMS)
- Defines replication strategy and factor
- Contains multiple column families (tables)
3. Column Family (Table)
- Collection of rows with similar structure
- Defined by primary key and clustering columns
- Stores data in SSTable format
4. Partitioner
- Determines which node stores each row
- Murmur3Partitioner: Default, evenly distributes data
- RandomPartitioner: Legacy, uses MD5 hash
- ByteOrderedPartitioner: Preserves sort order
5. Replication Strategy
- SimpleStrategy: Single datacenter replication
- NetworkTopologyStrategy: Multi-datacenter aware replication
Data Model
System Architecture Diagram
Write Path
System Architecture Diagram
Read Path
System Architecture Diagram
Configuration & Deployment
Production Setup
1. Cluster Configuration
# cassandra.yaml
cluster_name: 'production-cluster'
num_tokens: 256
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
# Network Configuration
listen_address: 10.0.1.100
rpc_address: 10.0.1.100
rpc_port: 9160
native_transport_port: 9042
# Data Storage
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
saved_caches_directory: /var/lib/cassandra/saved_caches
# Memory Configuration
heap_size_in_mb: 8192
new_size_in_mb: 200
max_heap_size_in_mb: 8192
heap_newsize_in_mb: 200
2. Keyspace Creation
-- Create keyspace with replication strategy
CREATE KEYSPACE user_analytics
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3,
'datacenter2': 2
};
-- Create table with appropriate partitioning
CREATE TABLE user_analytics.user_events (
user_id UUID,
event_time TIMESTAMP,
event_type TEXT,
event_data TEXT,
PRIMARY KEY (user_id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
3. Multi-Datacenter Setup
# Datacenter 1 - Primary
# cassandra-rackdc.properties
dc=datacenter1
rack=rack1
# Datacenter 2 - Secondary
# cassandra-rackdc.properties
dc=datacenter2
rack=rack1
Resource Requirements
Minimum Requirements
- CPU: 4 cores (8+ recommended)
- Memory: 8GB RAM (16GB+ recommended)
- Storage: SSD with 100GB+ free space
- Network: 1Gbps (10Gbps for high-throughput)
Production Recommendations
- CPU: 16+ cores for high-throughput workloads
- Memory: 32GB+ RAM (50% for heap, 50% for OS cache)
- Storage: NVMe SSD with 1TB+ capacity
- Network: 10Gbps+ with low latency
Performance Characteristics
Throughput Metrics
Write Performance
- Single Node: 10,000-50,000 writes/second
- Cluster (10 nodes): 100,000-500,000 writes/second
- Large Cluster (100+ nodes): 1M+ writes/second
- Bulk Loading: 100MB-1GB/second per node
Read Performance
- Single Node: 5,000-25,000 reads/second
- Cluster (10 nodes): 50,000-250,000 reads/second
- Point Queries: 1-5ms latency
- Range Queries: 10-100ms latency
Latency Characteristics
Write Latency
- P50: 1-3ms
- P95: 5-15ms
- P99: 20-100ms
- P99.9: 100-500ms
Read Latency
- P50: 1-5ms
- P95: 10-50ms
- P99: 50-200ms
- P99.9: 200ms-1s
Scalability Patterns
Horizontal Scaling
- Linear Scaling: Add nodes to increase capacity
- No Downtime: Add/remove nodes without service interruption
- Auto-Rebalancing: Automatic data redistribution
- Multi-Datacenter: Global distribution support
Vertical Scaling
- Memory: Increase heap size for larger caches
- CPU: More cores for higher concurrency
- Storage: Larger disks for more data retention
- Network: Higher bandwidth for better throughput
Operational Considerations
Failure Modes
Node Failures
- Single Node: Automatic failover, no data loss
- Multiple Nodes: Depends on replication factor
- Datacenter Failure: Cross-datacenter replication required
- Network Partition: Split-brain scenarios possible
Data Corruption
- SSTable Corruption: Automatic repair mechanisms
- Commit Log Corruption: Data loss possible
- Replication Inconsistency: Anti-entropy repair needed
Disaster Recovery
Backup Strategies
# Full backup
nodetool snapshot -t backup_$(date +%Y%m%d) keyspace_name
# Incremental backup
# Enable in cassandra.yaml
incremental_backups: true
# Restore from backup
# Stop node, restore SSTables, restart
Multi-Datacenter Recovery
- Cross-Datacenter Replication: Automatic failover
- Read Repair: Consistency maintenance
- Hinted Handoff: Temporary failure handling
- Anti-Entropy Repair: Long-term consistency
Maintenance Procedures
Node Addition
# 1. Start new node with auto_bootstrap: true
# 2. Monitor streaming progress
nodetool netstats
# 3. Verify cluster health
nodetool status
nodetool ring
Node Removal
# 1. Decommission node
nodetool decommission
# 2. Wait for streaming to complete
nodetool netstats
# 3. Stop and remove node
Schema Changes
-- Add column (non-breaking)
ALTER TABLE user_analytics.user_events
ADD new_column TEXT;
-- Modify column (requires careful planning)
-- Drop and recreate with new type
Production Best Practices
Configuration Tuning
Memory Optimization
# Heap size: 50% of total RAM
heap_size_in_mb: 16384
# New generation: 25% of heap
new_size_in_mb: 4096
# Enable compressed OOPs
-XX:+UseCompressedOops
Write Optimization
# Batch size optimization
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50
# Commit log optimization
commitlog_segment_size_in_mb: 32
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
Read Optimization
# Row cache for hot data
row_cache_size_in_mb: 0
row_cache_save_period: 0
# Key cache for partition keys
key_cache_size_in_mb: 100
key_cache_save_period: 14400
Security Configuration
Authentication & Authorization
# Enable authentication
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
# Create users
CREATE USER admin WITH PASSWORD 'secure_password' SUPERUSER;
CREATE USER app_user WITH PASSWORD 'app_password' NOSUPERUSER;
Encryption
# Client-to-node encryption
client_encryption_options:
enabled: true
keystore: /etc/cassandra/keystore.jks
keystore_password: keystore_password
# Node-to-node encryption
server_encryption_options:
internode_encryption: all
keystore: /etc/cassandra/keystore.jks
keystore_password: keystore_password
Monitoring & Alerting
Essential Metrics
# Cluster health
nodetool status
nodetool ring
nodetool info
# Performance metrics
nodetool tpstats
nodetool cfstats
nodetool proxyhistograms
Key Performance Indicators
- Read/Write Latency: P95, P99 percentiles
- Throughput: Operations per second
- Disk Usage: Space utilization per node
- Memory Usage: Heap and off-heap utilization
- Network: Bandwidth and connection counts
Alerting Thresholds
- High Latency: P95 > 100ms
- Low Throughput: < 50% of baseline
- Disk Space: > 80% utilization
- Memory Pressure: > 85% heap usage
- Node Down: Any node unavailable
Interview-Focused Content
Technology-Specific Questions
Architecture Deep-Dive
Q: Explain Cassandra's write path and how it ensures durability.
A: Cassandra's write path involves multiple steps:
- Client Request: Application sends write to coordinator node
- Replica Selection: Coordinator determines replicas based on partitioner and replication strategy
- Commit Log Write: All writes go to commit log first for durability
- Memtable Update: Data written to in-memory structure (memtable)
- Replication: Write sent to all replica nodes
- Acknowledgment: Coordinator waits for required number of acknowledgments based on consistency level
The commit log ensures durability by persisting writes before acknowledgment, allowing recovery from crashes.
Consistency Models
Q: How does Cassandra handle consistency and what are the trade-offs?
A: Cassandra offers tunable consistency with these levels:
- ONE: Single replica acknowledgment (fastest, least consistent)
- QUORUM: Majority of replicas (balanced)
- ALL: All replicas (slowest, most consistent)
- LOCAL_QUORUM: Majority in local datacenter
- EACH_QUORUM: Quorum in each datacenter
Trade-offs:
- Speed vs Consistency: Lower consistency = faster operations
- Availability vs Consistency: CAP theorem implications
- Network Partitions: Higher consistency levels more susceptible to failures
Operational Questions
Scaling Scenarios
Q: How would you scale a Cassandra cluster from 10 to 100 nodes?
A: Scaling approach:
- Capacity Planning: Calculate data distribution and token ranges
- Gradual Addition: Add nodes in batches (5-10 at a time)
- Monitor Streaming: Watch data movement and cluster health
- Load Testing: Verify performance after each batch
- Schema Optimization: Adjust replication factors if needed
- Network Planning: Ensure sufficient bandwidth for streaming
Key considerations:
- Token Assignment: Use vnodes for even distribution
- Streaming Throttle: Control network usage during rebalancing
- Read Repair: Monitor consistency during scaling
Troubleshooting Scenarios
Q: Your Cassandra cluster is experiencing high read latency. How do you investigate?
A: Investigation approach:
- Identify Affected Nodes: Check
nodetool proxyhistograms
- Analyze Query Patterns: Look for expensive queries or hotspots
- Check Resource Usage: CPU, memory, disk I/O
- Review Configuration: Cache settings, compaction strategy
- Monitor Compaction: Check for compaction backlog
- Network Analysis: Look for network bottlenecks
Common causes:
- Hot Partitions: Uneven data distribution
- Large Partitions: Too much data per partition
- Compaction Pressure: SSTable count too high
- Cache Misses: Insufficient row/key cache
- Network Issues: High latency between nodes
Design Integration
System Architecture
Q: How would you design a real-time analytics system using Cassandra?
A: Architecture components:
- Data Ingestion: Kafka for high-throughput event streaming
- Storage Layer: Cassandra for time-series data storage
- Query Layer: Application servers with connection pooling
- Caching Layer: Redis for hot data caching
- Analytics Layer: Spark for batch processing
Data model:
-- Time-series events table
CREATE TABLE analytics.events (
event_type TEXT,
timestamp TIMESTAMP,
user_id UUID,
event_data TEXT,
PRIMARY KEY (event_type, timestamp, user_id)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Considerations:
- Partition Design: Avoid hot partitions
- TTL Strategy: Automatic data expiration
- Compaction: TimeWindowCompactionStrategy for time-series
- Read Patterns: Optimize for time-range queries
Multi-Datacenter Design
Q: How would you design Cassandra for global deployment?
A: Multi-datacenter strategy:
- Geographic Distribution: Deploy in multiple regions
- Replication Strategy: NetworkTopologyStrategy with regional replication
- Consistency Levels: Use LOCAL_QUORUM for low latency
- Data Locality: Route reads to local datacenter
- Disaster Recovery: Cross-datacenter replication for RTO/RPO
Configuration:
CREATE KEYSPACE global_app
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'us-east': 3,
'us-west': 3,
'eu-west': 3,
'asia-pacific': 3
};
Trade-off Analysis
Cassandra vs Alternatives
Q: When would you choose Cassandra over DynamoDB or MongoDB?
A: Choose Cassandra when:
- Scale Requirements: Need to handle petabytes of data
- Multi-Datacenter: Global deployment with local reads
- Time-Series Data: Optimized for time-ordered data
- Cost Control: Want to avoid vendor lock-in
- Custom Tuning: Need fine-grained control over performance
Choose DynamoDB when:
- Managed Service: Want fully managed solution
- Predictable Workloads: Consistent access patterns
- AWS Ecosystem: Already using AWS services
- Auto-Scaling: Need automatic scaling without planning
Choose MongoDB when:
- Complex Queries: Need rich query capabilities
- Document Model: Natural fit for document data
- ACID Transactions: Need strong consistency guarantees
- Developer Experience: Prefer JSON-like data model
Real-World Scenarios
Production Case Studies
Netflix: Global Content Delivery
- Scale: 100+ billion events per day
- Architecture: Multi-datacenter Cassandra deployment
- Use Case: User viewing history, recommendations
- Challenges: Global consistency, low latency requirements
- Solutions: Eventual consistency, local reads, anti-entropy repair
Instagram: User Activity Tracking
- Scale: 500+ million users, billions of interactions
- Architecture: Time-series optimized Cassandra
- Use Case: User feeds, activity streams
- Challenges: Hot partitions, write amplification
- Solutions: Partition design, compaction tuning, caching
Failure Stories & Lessons
Configuration Mistakes
Scenario: Production cluster experiencing data loss Root Cause: Incorrect replication factor configuration Impact: 2 hours of data loss, service degradation Lesson: Always validate replication strategy before production deployment
Scaling Issues
Scenario: Cluster performance degradation after adding nodes Root Cause: Insufficient network bandwidth for streaming Impact: 50% performance drop during rebalancing Lesson: Plan network capacity for scaling operations
Memory Pressure
Scenario: Cluster nodes crashing due to out-of-memory errors Root Cause: Heap size too large, insufficient OS cache Impact: Service unavailability, data inconsistency Lesson: Balance heap size with OS cache requirements
This comprehensive guide covers Cassandra from basic concepts to advanced production deployment, providing the depth needed for both technical interviews and real-world system design scenarios.