Multi-Leader Replication

Core Concept

advanced
25-35 minutes
replicationmulti-masterconflictsgeo-distributionavailabilityconsistency

Multiple writable nodes with conflict resolution strategies

Multi-Leader Replication

Overview

Multi-leader replication allows multiple nodes to accept writes simultaneously, with changes replicated between all leader nodes. This approach improves write availability and reduces latency for geographically distributed systems, but introduces the complexity of conflict resolution when concurrent writes occur.

This pattern is essential for globally distributed applications that require high write availability and low latency across multiple regions.

Architecture

Region A                Region B                Region C
[Leader A] ←→ replication ←→ [Leader B] ←→ replication ←→ [Leader C]
    ↓                          ↓                          ↓
[Follower A1]              [Follower B1]              [Follower C1]
[Follower A2]              [Follower B2]              [Follower C2]

Key Challenges

1. Conflict Detection and Resolution

class MultiLeaderReplication:
    def __init__(self, node_id):
        self.node_id = node_id
        self.data = {}
        self.vector_clock = {}
        self.peers = []
        
    def write(self, key, value):
        # Increment local clock
        self.vector_clock[self.node_id] = self.vector_clock.get(self.node_id, 0) + 1
        
        # Create versioned entry
        entry = {
            'key': key,
            'value': value,
            'vector_clock': self.vector_clock.copy(),
            'node_id': self.node_id,
            'timestamp': time.time()
        }
        
        # Apply locally
        self.apply_write(entry)
        
        # Replicate to peers
        self.replicate_to_peers(entry)
    
    def apply_write(self, entry):
        key = entry['key']
        
        if key not in self.data:
            self.data[key] = []
        
        # Check for conflicts
        conflicts = self.detect_conflicts(key, entry)
        
        if conflicts:
            # Resolve conflict using strategy
            resolved_value = self.resolve_conflict(conflicts + [entry])
            self.data[key] = [resolved_value]
        else:
            self.data[key].append(entry)
    
    def detect_conflicts(self, key, new_entry):
        if key not in self.data:
            return []
        
        conflicts = []
        for existing in self.data[key]:
            if self.are_concurrent(existing['vector_clock'], new_entry['vector_clock']):
                conflicts.append(existing)
        
        return conflicts
    
    def resolve_conflict(self, conflicted_entries):
        # Last-writer-wins strategy
        return max(conflicted_entries, key=lambda x: x['timestamp'])

2. Conflict Resolution Strategies

Last-Writer-Wins (LWW):

  • Simple timestamp-based resolution
  • Risk of data loss
  • Good for systems where losing updates is acceptable

Multi-Value Resolution:

  • Keep all conflicting values
  • Let application resolve conflicts
  • Used in Amazon DynamoDB

Custom Resolution Logic:

  • Application-specific conflict resolution
  • Complex but precise
  • Example: Merge shopping cart items

Implementation Patterns

Conflict-Free Replicated Data Types (CRDTs)

class GCounterCRDT:
    """Grow-only counter CRDT"""
    def __init__(self, node_id):
        self.node_id = node_id
        self.counts = {}  # node_id -> count
    
    def increment(self):
        """Increment local counter"""
        self.counts[self.node_id] = self.counts.get(self.node_id, 0) + 1
    
    def value(self):
        """Get current counter value"""
        return sum(self.counts.values())
    
    def merge(self, other):
        """Merge with another counter (conflict-free)"""
        result = GCounterCRDT(self.node_id)
        
        all_nodes = set(self.counts.keys()) | set(other.counts.keys())
        
        for node in all_nodes:
            self_count = self.counts.get(node, 0)
            other_count = other.counts.get(node, 0)
            result.counts[node] = max(self_count, other_count)
        
        return result

class PNCounterCRDT:
    """Increment/Decrement counter CRDT"""
    def __init__(self, node_id):
        self.node_id = node_id
        self.increments = GCounterCRDT(node_id)
        self.decrements = GCounterCRDT(node_id)
    
    def increment(self):
        self.increments.increment()
    
    def decrement(self):
        self.decrements.increment()
    
    def value(self):
        return self.increments.value() - self.decrements.value()
    
    def merge(self, other):
        result = PNCounterCRDT(self.node_id)
        result.increments = self.increments.merge(other.increments)
        result.decrements = self.decrements.merge(other.decrements)
        return result

Real-World Examples

Amazon DynamoDB Global Tables

# DynamoDB handles multi-leader automatically
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('GlobalTable')

# Writes to any region are replicated globally
table.put_item(
    Item={
        'id': '123',
        'data': 'value',
        'last_updated': int(time.time())
    }
)

Apache Cassandra Multi-DC

-- Create keyspace with multi-datacenter replication
CREATE KEYSPACE mykeyspace
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3,
    'datacenter2': 3
};

-- Writes with consistency level
INSERT INTO users (id, name, email) VALUES (1, 'John', 'john@example.com')
USING CONSISTENCY LOCAL_QUORUM;

Performance Considerations

Write Latency:

  • Local writes: ~1-5ms
  • Cross-region replication: 50-200ms
  • Conflict resolution overhead: 10-50ms

Throughput:

  • Scales linearly with number of leader nodes
  • Limited by network bandwidth for replication
  • Conflict rate affects performance

Best Practices

  1. Minimize Conflicts:
    • Partition data by geography/user
    • Use CRDTs when possible
    • Implement application-level coordination
  2. Monitor Conflicts:
    • Track conflict rates
    • Measure resolution latency
    • Alert on high conflict scenarios
  3. Design for Conflicts:
    • Plan conflict resolution strategies
    • Test edge cases thoroughly
    • Provide manual resolution tools
  4. Network Optimization:
    • Compress replication data
    • Batch updates when possible
    • Use dedicated replication channels

Trade-offs

Advantages:

  • High write availability
  • Low latency for geo-distributed writes
  • No single point of failure
  • Better performance for global applications

Disadvantages:

  • Complex conflict resolution
  • Potential data inconsistency
  • Higher implementation complexity
  • Difficult to reason about consistency

Multi-leader replication is powerful but complex, requiring careful design and conflict resolution strategies to maintain data integrity across distributed systems.

Related Concepts

single-leader-replication
conflict-resolution
vector-clocks
crdt

Used By

cassandradynamocouchdbriak