Bigtable: A Distributed Storage System for Structured Data

Research Paper

2006
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
distributed-systemsNoSQLstorageGoogledatabasesscalability

Abstract

Abstract

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.

Key Concepts

Data Model

Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.

(row:string, column:string, time:int64) → string

Components

  1. Tablet: A contiguous range of rows stored together
  2. Tablet Server: Manages a set of tablets
  3. Master Server: Coordinates tablet servers and handles schema changes
  4. Chubby: Distributed lock service for coordination

Column Families

  • Columns are grouped into column families
  • All data in a column family is typically the same type
  • Column families must be created before data is stored
  • Column keys within a family can be created on demand

Architecture

Master Server

  • Assigns tablets to tablet servers
  • Detects addition and expiration of tablet servers
  • Balances tablet server load
  • Handles schema changes (table and column family creation/deletion)
  • Garbage collects files in GFS

Tablet Server

  • Manages a set of tablets (typically 10-1000 tablets per server)
  • Handles read and write requests to the tablets it has loaded
  • Splits tablets that have grown too large

Tablet Location

  • Uses a three-level hierarchy analogous to that of a B+ tree
  • Root tablet contains the location of all tablets in a special METADATA table
  • METADATA tablets contain the location of a set of user tablets

Implementation

SSTable

  • Immutable files containing a sequence of key-value pairs
  • Sorted by key
  • Indexed for fast lookups
  • Stored in GFS (Google File System)

Memtable

  • In-memory buffer for recent writes
  • Sorted by key
  • Flushed to disk when it reaches a threshold size

Compaction

  • Minor compaction: Converts memtable to SSTable
  • Merging compaction: Combines multiple SSTables
  • Major compaction: Merges all SSTables into a single SSTable

Performance Characteristics

Scalability

  • Handles petabytes of data
  • Scales to thousands of servers
  • Automatic load balancing
  • Dynamic tablet splitting

Consistency

  • Strong consistency within a single row
  • Eventual consistency across rows
  • Single-row transactions

Availability

  • Fault tolerance through replication
  • Automatic failover
  • Data recovery from GFS

Use Cases at Google

  1. Web Search: Storing web pages and their metadata
  2. Google Earth: Storing satellite imagery and geographic data
  3. Google Analytics: Storing user interaction data
  4. Google Finance: Storing financial market data
  5. Personalized Search: Storing user preferences and history

Design Decisions

Why NoSQL?

  • Need for massive scalability
  • Flexible schema requirements
  • High write throughput
  • Geographic distribution

Why Column Families?

  • Logical grouping of related data
  • Efficient compression
  • Access control at family level
  • Different compression algorithms per family

Why Timestamps?

  • Versioning of data
  • Time-based queries
  • Automatic garbage collection of old versions

Impact on Modern Systems

Bigtable influenced the design of many modern NoSQL databases:

  • Apache HBase: Open-source Bigtable implementation
  • Cassandra: Distributed NoSQL database
  • DynamoDB: Amazon's managed NoSQL service
  • MongoDB: Document-oriented database

Why It Matters for Software Engineering

Understanding Bigtable is crucial for:

  • System Design: Designing scalable storage systems
  • Database Architecture: Understanding NoSQL principles
  • Distributed Systems: Learning about consistency and partitioning
  • Cloud Computing: Understanding managed database services

The paper demonstrates how to build a highly scalable, distributed storage system that can handle the demands of modern web-scale applications.

Loading PDF...

Analysis & Content

Click the button above to view detailed analysis and discussion of this paper

Key insights
Detailed breakdown