Dynamo: Amazon's Highly Available Key-value Store

Research Paper

2007
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels
distributed-systemsNoSQLAmazonkey-value-storeavailabilityconsistency

Abstract

Abstract

Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which serves tens of millions of customers at peak times using tens of thousands of servers located in many data centers around the world, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many data centers around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems.

This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

Key Design Principles

Design Requirements

  1. Incremental scalability: System should allow addition of new nodes without manual partitioning
  2. Symmetry: All nodes should have the same set of responsibilities
  3. Decentralization: Avoid single points of failure
  4. Heterogeneity: System should be able to exploit heterogeneity in the infrastructure

Service Level Agreement (SLA)

  • Availability: 99.9% availability
  • Performance: 99.9th percentile of response time < 300ms
  • Scalability: Handle millions of requests per day

System Architecture

Components

  1. Request Coordination: Uses consistent hashing for load distribution
  2. Membership and Failure Detection: Gossip-based protocol
  3. Failure Handling: Hinted handoff and anti-entropy
  4. Scaling: Virtual nodes for load balancing

Data Partitioning

  • Consistent Hashing: Distributes load across nodes
  • Virtual Nodes: Improves load distribution
  • Replication: Each data item replicated at N nodes

Replication

  • Strategy: Each key is assigned to a coordinator node
  • Replication: Coordinator replicates key to N-1 successor nodes
  • Preference List: List of nodes responsible for a key

Consistency and Versioning

Eventual Consistency

  • Model: System provides eventual consistency
  • Trade-off: Availability over strong consistency
  • Resolution: Application-assisted conflict resolution

Vector Clocks

  • Purpose: Capture causality between different versions
  • Format: List of (node, counter) pairs
  • Comparison: Used to determine version ordering

Conflict Resolution

  • Last-write-wins: Simple resolution strategy
  • Application-level: Custom resolution logic
  • Sloppy Quorum: Relaxed quorum requirements

Failure Handling

Hinted Handoff

  • Purpose: Handle temporary node failures
  • Process: Store writes for failed nodes locally
  • Recovery: Transfer data when node comes back online

Anti-Entropy

  • Purpose: Synchronize divergent replicas
  • Method: Merkle trees for efficient comparison
  • Process: Compare and repair differences

Membership

  • Gossip Protocol: Disseminate membership changes
  • Failure Detection: Local failure detection
  • Bootstrap: New nodes learn about system state

Performance Optimizations

Load Balancing

  • Virtual Nodes: Multiple virtual nodes per physical node
  • Consistent Hashing: Uniform distribution
  • Dynamic Adjustment: Adapt to load changes

Caching

  • Object Caching: Cache frequently accessed objects
  • Write-through: Ensure cache consistency
  • Cache Invalidation: Handle cache updates

Request Routing

  • Load-aware: Route requests to least loaded nodes
  • Geographic: Route to nearest data center
  • Failover: Automatic failover to healthy nodes

Use Cases at Amazon

  1. Shopping Cart: Store user shopping cart data
  2. Session Management: Manage user sessions
  3. Product Catalog: Store product information
  4. Recommendation Engine: Store user preferences
  5. Wish Lists: Store user wish list data

Trade-offs and Limitations

Consistency vs. Availability

  • CAP Theorem: Cannot guarantee both consistency and availability
  • Choice: Dynamo chooses availability over consistency
  • Impact: Applications must handle eventual consistency

Conflict Resolution

  • Complexity: Applications must implement conflict resolution
  • Performance: Conflict resolution can impact performance
  • Correctness: Incorrect resolution can cause data loss

Scalability Limits

  • Gossip Overhead: Membership protocol doesn't scale to thousands of nodes
  • Network Partitions: Large partitions can cause issues
  • Storage Limits: Single node storage capacity

Impact on Modern Systems

Dynamo influenced the design of many modern distributed systems:

  • Apache Cassandra: Open-source Dynamo implementation
  • Riak: Distributed NoSQL database
  • Voldemort: LinkedIn's distributed key-value store
  • Amazon DynamoDB: Managed Dynamo service

Why It Matters for Software Engineering

Understanding Dynamo is crucial for:

  • System Design: Designing highly available distributed systems
  • NoSQL Databases: Understanding eventual consistency models
  • CAP Theorem: Understanding consistency-availability trade-offs
  • Fault Tolerance: Learning about failure handling in distributed systems

The paper demonstrates how to build a highly available distributed system that prioritizes availability over strong consistency, a common pattern in modern web-scale applications.

Loading PDF...

Analysis & Content

Click the button above to view detailed analysis and discussion of this paper

Key insights
Detailed breakdown