Bigtable: A Distributed Storage System for Structured Data
Research Paper
Abstract
Abstract
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.
Key Concepts
Data Model
Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
(row:string, column:string, time:int64) → string
Components
- Tablet: A contiguous range of rows stored together
- Tablet Server: Manages a set of tablets
- Master Server: Coordinates tablet servers and handles schema changes
- Chubby: Distributed lock service for coordination
Column Families
- Columns are grouped into column families
- All data in a column family is typically the same type
- Column families must be created before data is stored
- Column keys within a family can be created on demand
Architecture
Master Server
- Assigns tablets to tablet servers
- Detects addition and expiration of tablet servers
- Balances tablet server load
- Handles schema changes (table and column family creation/deletion)
- Garbage collects files in GFS
Tablet Server
- Manages a set of tablets (typically 10-1000 tablets per server)
- Handles read and write requests to the tablets it has loaded
- Splits tablets that have grown too large
Tablet Location
- Uses a three-level hierarchy analogous to that of a B+ tree
- Root tablet contains the location of all tablets in a special METADATA table
- METADATA tablets contain the location of a set of user tablets
Implementation
SSTable
- Immutable files containing a sequence of key-value pairs
- Sorted by key
- Indexed for fast lookups
- Stored in GFS (Google File System)
Memtable
- In-memory buffer for recent writes
- Sorted by key
- Flushed to disk when it reaches a threshold size
Compaction
- Minor compaction: Converts memtable to SSTable
- Merging compaction: Combines multiple SSTables
- Major compaction: Merges all SSTables into a single SSTable
Performance Characteristics
Scalability
- Handles petabytes of data
- Scales to thousands of servers
- Automatic load balancing
- Dynamic tablet splitting
Consistency
- Strong consistency within a single row
- Eventual consistency across rows
- Single-row transactions
Availability
- Fault tolerance through replication
- Automatic failover
- Data recovery from GFS
Use Cases at Google
- Web Search: Storing web pages and their metadata
- Google Earth: Storing satellite imagery and geographic data
- Google Analytics: Storing user interaction data
- Google Finance: Storing financial market data
- Personalized Search: Storing user preferences and history
Design Decisions
Why NoSQL?
- Need for massive scalability
- Flexible schema requirements
- High write throughput
- Geographic distribution
Why Column Families?
- Logical grouping of related data
- Efficient compression
- Access control at family level
- Different compression algorithms per family
Why Timestamps?
- Versioning of data
- Time-based queries
- Automatic garbage collection of old versions
Impact on Modern Systems
Bigtable influenced the design of many modern NoSQL databases:
- Apache HBase: Open-source Bigtable implementation
- Cassandra: Distributed NoSQL database
- DynamoDB: Amazon's managed NoSQL service
- MongoDB: Document-oriented database
Why It Matters for Software Engineering
Understanding Bigtable is crucial for:
- System Design: Designing scalable storage systems
- Database Architecture: Understanding NoSQL principles
- Distributed Systems: Learning about consistency and partitioning
- Cloud Computing: Understanding managed database services
The paper demonstrates how to build a highly scalable, distributed storage system that can handle the demands of modern web-scale applications.
PDF Document
Loading PDF...
Analysis & Content
Click the button above to view detailed analysis and discussion of this paper