Multi-Leader Replication
Core Concept
Multiple writable nodes with conflict resolution strategies
Multi-Leader Replication
Overview
Multi-leader replication allows multiple nodes to accept writes simultaneously, with changes replicated between all leader nodes. This approach improves write availability and reduces latency for geographically distributed systems, but introduces the complexity of conflict resolution when concurrent writes occur.
This pattern is essential for globally distributed applications that require high write availability and low latency across multiple regions.
Architecture
Region A Region B Region C
[Leader A] ←→ replication ←→ [Leader B] ←→ replication ←→ [Leader C]
↓ ↓ ↓
[Follower A1] [Follower B1] [Follower C1]
[Follower A2] [Follower B2] [Follower C2]
Key Challenges
1. Conflict Detection and Resolution
class MultiLeaderReplication:
def __init__(self, node_id):
self.node_id = node_id
self.data = {}
self.vector_clock = {}
self.peers = []
def write(self, key, value):
# Increment local clock
self.vector_clock[self.node_id] = self.vector_clock.get(self.node_id, 0) + 1
# Create versioned entry
entry = {
'key': key,
'value': value,
'vector_clock': self.vector_clock.copy(),
'node_id': self.node_id,
'timestamp': time.time()
}
# Apply locally
self.apply_write(entry)
# Replicate to peers
self.replicate_to_peers(entry)
def apply_write(self, entry):
key = entry['key']
if key not in self.data:
self.data[key] = []
# Check for conflicts
conflicts = self.detect_conflicts(key, entry)
if conflicts:
# Resolve conflict using strategy
resolved_value = self.resolve_conflict(conflicts + [entry])
self.data[key] = [resolved_value]
else:
self.data[key].append(entry)
def detect_conflicts(self, key, new_entry):
if key not in self.data:
return []
conflicts = []
for existing in self.data[key]:
if self.are_concurrent(existing['vector_clock'], new_entry['vector_clock']):
conflicts.append(existing)
return conflicts
def resolve_conflict(self, conflicted_entries):
# Last-writer-wins strategy
return max(conflicted_entries, key=lambda x: x['timestamp'])
2. Conflict Resolution Strategies
Last-Writer-Wins (LWW):
- Simple timestamp-based resolution
- Risk of data loss
- Good for systems where losing updates is acceptable
Multi-Value Resolution:
- Keep all conflicting values
- Let application resolve conflicts
- Used in Amazon DynamoDB
Custom Resolution Logic:
- Application-specific conflict resolution
- Complex but precise
- Example: Merge shopping cart items
Implementation Patterns
Conflict-Free Replicated Data Types (CRDTs)
class GCounterCRDT:
"""Grow-only counter CRDT"""
def __init__(self, node_id):
self.node_id = node_id
self.counts = {} # node_id -> count
def increment(self):
"""Increment local counter"""
self.counts[self.node_id] = self.counts.get(self.node_id, 0) + 1
def value(self):
"""Get current counter value"""
return sum(self.counts.values())
def merge(self, other):
"""Merge with another counter (conflict-free)"""
result = GCounterCRDT(self.node_id)
all_nodes = set(self.counts.keys()) | set(other.counts.keys())
for node in all_nodes:
self_count = self.counts.get(node, 0)
other_count = other.counts.get(node, 0)
result.counts[node] = max(self_count, other_count)
return result
class PNCounterCRDT:
"""Increment/Decrement counter CRDT"""
def __init__(self, node_id):
self.node_id = node_id
self.increments = GCounterCRDT(node_id)
self.decrements = GCounterCRDT(node_id)
def increment(self):
self.increments.increment()
def decrement(self):
self.decrements.increment()
def value(self):
return self.increments.value() - self.decrements.value()
def merge(self, other):
result = PNCounterCRDT(self.node_id)
result.increments = self.increments.merge(other.increments)
result.decrements = self.decrements.merge(other.decrements)
return result
Real-World Examples
Amazon DynamoDB Global Tables
# DynamoDB handles multi-leader automatically
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('GlobalTable')
# Writes to any region are replicated globally
table.put_item(
Item={
'id': '123',
'data': 'value',
'last_updated': int(time.time())
}
)
Apache Cassandra Multi-DC
-- Create keyspace with multi-datacenter replication
CREATE KEYSPACE mykeyspace
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3,
'datacenter2': 3
};
-- Writes with consistency level
INSERT INTO users (id, name, email) VALUES (1, 'John', 'john@example.com')
USING CONSISTENCY LOCAL_QUORUM;
Performance Considerations
Write Latency:
- Local writes: ~1-5ms
- Cross-region replication: 50-200ms
- Conflict resolution overhead: 10-50ms
Throughput:
- Scales linearly with number of leader nodes
- Limited by network bandwidth for replication
- Conflict rate affects performance
Best Practices
- Minimize Conflicts:
- Partition data by geography/user
- Use CRDTs when possible
- Implement application-level coordination
- Monitor Conflicts:
- Track conflict rates
- Measure resolution latency
- Alert on high conflict scenarios
- Design for Conflicts:
- Plan conflict resolution strategies
- Test edge cases thoroughly
- Provide manual resolution tools
- Network Optimization:
- Compress replication data
- Batch updates when possible
- Use dedicated replication channels
Trade-offs
Advantages:
- High write availability
- Low latency for geo-distributed writes
- No single point of failure
- Better performance for global applications
Disadvantages:
- Complex conflict resolution
- Potential data inconsistency
- Higher implementation complexity
- Difficult to reason about consistency
Multi-leader replication is powerful but complex, requiring careful design and conflict resolution strategies to maintain data integrity across distributed systems.