The Google File System

Research Paper

2003
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
distributed-systemsfile-systemGooglestoragereplicationfault-tolerance

Abstract

Abstract

We have designed and implemented the Google File System (GFS), a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points.

Key Design Assumptions

Workload Characteristics

  1. Component failures are the norm: Hardware failures are common, not exceptional
  2. Files are huge by traditional standards: Multi-GB files are common
  3. Most mutations are appends: Files are typically written once, read many times
  4. Co-designing applications and file system: Applications can be modified to work with GFS

System Characteristics

  • Large files: Multi-GB files are the common case
  • Sequential reads: Large streaming reads of 1MB or more
  • Append writes: Many clients appending to the same file concurrently
  • High sustained bandwidth: More important than low latency

Architecture

Components

  1. Master: Single master server managing metadata
  2. Chunkservers: Multiple chunkservers storing data chunks
  3. Clients: Applications that access the file system

File Organization

  • Files are divided into fixed-size chunks (64MB by default)
  • Each chunk has a globally unique 64-bit chunk handle
  • Chunks are replicated on multiple chunkservers (3 replicas by default)

Master Responsibilities

  • Metadata management: File and chunk namespaces
  • Chunk placement: Decides where to place new chunks
  • Replication: Manages chunk replication
  • Garbage collection: Removes orphaned chunks
  • Chunk migration: Rebalances load across chunkservers

Design Decisions

Single Master

  • Pros: Simple design, centralized metadata management
  • Cons: Potential bottleneck and single point of failure
  • Mitigation: Minimal master involvement in reads/writes, master state replication

Large Chunk Size (64MB)

  • Benefits: Reduces metadata size, reduces client-master interactions
  • Trade-offs: Small files waste space, hot spots for concurrent writes

Relaxed Consistency Model

  • File namespace mutations: Atomic and consistent
  • File data mutations: Region is consistent if all clients see the same data
  • Concurrent mutations: Region may be undefined but consistent

Implementation Details

Chunk Replication

  • Default replication factor: 3
  • Placement strategy: Spread replicas across racks
  • Replication process: Master initiates replication when chunkservers fail

Master State

  • In-memory: File and chunk namespaces, chunk locations
  • Persistent: Operation log for metadata changes
  • Checkpoints: Periodic snapshots of master state

Client Interactions

  1. Read: Client queries master for chunk locations, then reads from chunkservers
  2. Write: Client queries master, then writes to all replicas
  3. Append: Client queries master, then appends to all replicas

Fault Tolerance

Chunkserver Failures

  • Detection: Master monitors chunkservers via heartbeats
  • Recovery: Master re-replicates chunks from failed servers
  • Impact: Minimal impact on ongoing operations

Master Failures

  • Detection: External monitoring systems
  • Recovery: Master state can be restored from operation log
  • Impact: System becomes read-only during recovery

Network Partitions

  • Split-brain prevention: Master uses leases to prevent split-brain
  • Consistency: Relaxed consistency model handles network issues

Performance Characteristics

Throughput

  • Aggregate throughput: High due to parallel access
  • Single file throughput: Limited by single chunkserver
  • Master throughput: High due to minimal involvement

Latency

  • Read latency: Low for cached metadata
  • Write latency: Higher due to replication
  • Append latency: Optimized for concurrent appends

Use Cases at Google

  1. Web Search: Storing crawled web pages
  2. MapReduce: Input and output storage
  3. Bigtable: Underlying storage layer
  4. Google Earth: Storing satellite imagery
  5. YouTube: Storing video files

Impact on Modern Systems

GFS influenced the design of many modern distributed file systems:

  • HDFS: Hadoop Distributed File System
  • Ceph: Distributed storage system
  • GlusterFS: Scale-out network-attached storage
  • Amazon S3: Object storage service

Why It Matters for Software Engineering

Understanding GFS is crucial for:

  • System Design: Designing distributed storage systems
  • Big Data: Understanding how large-scale data processing works
  • Cloud Computing: Understanding distributed storage principles
  • Fault Tolerance: Learning about handling failures in distributed systems

The paper demonstrates how to design a distributed file system that can handle the scale and reliability requirements of modern web-scale applications.

Loading PDF...

Analysis & Content

Click the button above to view detailed analysis and discussion of this paper

Key insights
Detailed breakdown