Abstract

We have designed and implemented the Google File System (GFS), a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points.

Key Design Assumptions

Workload Characteristics

Component failures are the norm: Hardware failures are common, not exceptional
Files are huge by traditional standards: Multi-GB files are common
Most mutations are appends: Files are typically written once, read many times
Co-designing applications and file system: Applications can be modified to work with GFS

System Characteristics

Large files: Multi-GB files are the common case
Sequential reads: Large streaming reads of 1MB or more
Append writes: Many clients appending to the same file concurrently
High sustained bandwidth: More important than low latency

Architecture

Components

Master: Single master server managing metadata
Chunkservers: Multiple chunkservers storing data chunks
Clients: Applications that access the file system

File Organization

Files are divided into fixed-size chunks (64MB by default)
Each chunk has a globally unique 64-bit chunk handle
Chunks are replicated on multiple chunkservers (3 replicas by default)

Master Responsibilities

Metadata management: File and chunk namespaces
Chunk placement: Decides where to place new chunks
Replication: Manages chunk replication
Garbage collection: Removes orphaned chunks
Chunk migration: Rebalances load across chunkservers

Design Decisions

Single Master

Pros: Simple design, centralized metadata management
Cons: Potential bottleneck and single point of failure
Mitigation: Minimal master involvement in reads/writes, master state replication

Large Chunk Size (64MB)

Benefits: Reduces metadata size, reduces client-master interactions
Trade-offs: Small files waste space, hot spots for concurrent writes

Relaxed Consistency Model

File namespace mutations: Atomic and consistent
File data mutations: Region is consistent if all clients see the same data
Concurrent mutations: Region may be undefined but consistent

Implementation Details

Chunk Replication

Default replication factor: 3
Placement strategy: Spread replicas across racks
Replication process: Master initiates replication when chunkservers fail

Master State

In-memory: File and chunk namespaces, chunk locations
Persistent: Operation log for metadata changes
Checkpoints: Periodic snapshots of master state

Client Interactions

Read: Client queries master for chunk locations, then reads from chunkservers
Write: Client queries master, then writes to all replicas
Append: Client queries master, then appends to all replicas

Fault Tolerance

Chunkserver Failures

Detection: Master monitors chunkservers via heartbeats
Recovery: Master re-replicates chunks from failed servers
Impact: Minimal impact on ongoing operations

Master Failures

Detection: External monitoring systems
Recovery: Master state can be restored from operation log
Impact: System becomes read-only during recovery

Network Partitions

Split-brain prevention: Master uses leases to prevent split-brain
Consistency: Relaxed consistency model handles network issues

Performance Characteristics

Throughput

Aggregate throughput: High due to parallel access
Single file throughput: Limited by single chunkserver
Master throughput: High due to minimal involvement

Latency

Read latency: Low for cached metadata
Write latency: Higher due to replication
Append latency: Optimized for concurrent appends

Use Cases at Google

Web Search: Storing crawled web pages
MapReduce: Input and output storage
Bigtable: Underlying storage layer
Google Earth: Storing satellite imagery
YouTube: Storing video files

Impact on Modern Systems

GFS influenced the design of many modern distributed file systems:

HDFS: Hadoop Distributed File System
Ceph: Distributed storage system
GlusterFS: Scale-out network-attached storage
Amazon S3: Object storage service

Why It Matters for Software Engineering

Understanding GFS is crucial for:

System Design: Designing distributed storage systems
Big Data: Understanding how large-scale data processing works
Cloud Computing: Understanding distributed storage principles
Fault Tolerance: Learning about handling failures in distributed systems

The paper demonstrates how to design a distributed file system that can handle the scale and reliability requirements of modern web-scale applications.

The Google File System

Abstract

Abstract

Key Design Assumptions

Workload Characteristics

System Characteristics

Architecture

Components

File Organization

Master Responsibilities

Design Decisions

Single Master

Large Chunk Size (64MB)

Relaxed Consistency Model

Implementation Details

Chunk Replication

Master State

Client Interactions

Fault Tolerance

Chunkserver Failures

Master Failures

Network Partitions

Performance Characteristics

Throughput

Latency

Use Cases at Google

Impact on Modern Systems

Why It Matters for Software Engineering

PDF Document

Analysis & Content

Contents

Paper Info