In-Memory Databases

Core Concept

intermediate
20-30 minutes
memory-storageperformancecachingredislow-latencydurability

Benefits and challenges of keeping entire datasets in RAM

In-Memory Databases

Overview

In-memory databases store data primarily in main memory (RAM) rather than traditional disk storage, enabling extremely fast data access and processing. This approach trades off storage capacity and durability for significant performance gains, making it ideal for applications requiring ultra-low latency and high throughput, like having a super-fast filing cabinet that's always open and ready.

Popular in-memory databases include Redis, SAP HANA, Oracle TimesTen, and VoltDB, each optimized for different use cases from caching to real-time analytics.

System Architecture Diagram

Core Architecture

In-memory databases work by keeping all data in RAM for ultra-fast access, while still providing persistence mechanisms to survive system restarts. Think of it like having your entire library of books open and ready to read instantly, rather than having to go to the bookshelf each time.

Key Components:

  • Memory Storage: All active data resides in RAM for instant access
  • Persistence Layer: Optional disk storage for durability and recovery
  • Eviction Policies: Smart algorithms to manage memory when it gets full
  • Indexing: Fast lookup structures optimized for memory access patterns

Memory Management: When memory gets full, the database uses eviction policies like LRU (Least Recently Used) to decide which data to remove, similar to how your computer manages RAM when running out of space.

Performance Characteristics:

  • Read Operations: Sub-millisecond response times
  • Write Operations: Near-instantaneous updates
  • Throughput: Can handle millions of operations per second
  • Latency: Consistent low latency regardless of data size

Memory Management Strategies

1. Eviction Policies

class LRUEvictionPolicy:
    def __init__(self):
        self.access_order = OrderedDict()
    
    def access(self, key):
        """Mark key as recently accessed"""
        if key in self.access_order:
            del self.access_order[key]
        self.access_order[key] = time.time()
    
    def evict(self):
        """Return least recently used key for eviction"""
        if self.access_order:
            return self.access_order.popitem(last=False)[0]
        return None

class LFUEvictionPolicy:
    def __init__(self):
        self.frequencies = defaultdict(int)
        self.frequency_groups = defaultdict(OrderedDict)
        self.min_frequency = 0
    
    def access(self, key):
        """Update frequency count for key"""
        old_freq = self.frequencies[key]
        new_freq = old_freq + 1
        
        self.frequencies[key] = new_freq
        
        # Move from old frequency group to new
        if old_freq > 0 and key in self.frequency_groups[old_freq]:
            del self.frequency_groups[old_freq][key]
            if len(self.frequency_groups[old_freq]) == 0 and old_freq == self.min_frequency:
                self.min_frequency += 1
        
        self.frequency_groups[new_freq][key] = time.time()
    
    def evict(self):
        """Evict least frequently used key"""
        if self.min_frequency in self.frequency_groups and self.frequency_groups[self.min_frequency]:
            key = next(iter(self.frequency_groups[self.min_frequency]))
            del self.frequency_groups[self.min_frequency][key]
            del self.frequencies[key]
            return key
        return None

2. Memory-Mapped Files

import mmap
import os

class MemoryMappedStorage:
    def __init__(self, filename, size_mb=1024):
        self.filename = filename
        self.size = size_mb * 1024 * 1024
        self.file_handle = None
        self.memory_map = None
        self.initialize_storage()
    
    def initialize_storage(self):
        """Initialize memory-mapped file"""
        # Create file if it doesn't exist
        if not os.path.exists(self.filename):
            with open(self.filename, 'wb') as f:
                f.write(b'\0' * self.size)
        
        # Open file and create memory map
        self.file_handle = open(self.filename, 'r+b')
        self.memory_map = mmap.mmap(
            self.file_handle.fileno(), 
            self.size, 
            access=mmap.ACCESS_WRITE
        )
    
    def write_at_offset(self, offset, data):
        """Write data at specific offset"""
        if offset + len(data) > self.size:
            raise ValueError("Data exceeds mapped region")
        
        self.memory_map[offset:offset+len(data)] = data
        self.memory_map.flush()  # Optional: force write to disk
    
    def read_at_offset(self, offset, length):
        """Read data from specific offset"""
        if offset + length > self.size:
            raise ValueError("Read exceeds mapped region")
        
        return self.memory_map[offset:offset+length]

Persistence and Durability

1. Snapshots and Checkpointing

class SnapshotManager:
    def __init__(self, database, snapshot_interval=300):  # 5 minutes
        self.database = database
        self.snapshot_interval = snapshot_interval
        self.last_snapshot = time.time()
        self.snapshot_thread = None
        self.start_background_snapshots()
    
    def create_snapshot(self, filename=None):
        """Create point-in-time snapshot of database"""
        if filename is None:
            filename = f"snapshot_{int(time.time())}.db"
        
        # Create consistent snapshot
        snapshot_data = {
            'timestamp': time.time(),
            'data': dict(self.database.data),  # Deep copy
            'metadata': {
                'record_count': len(self.database.data),
                'memory_usage': self.database.current_memory_usage
            }
        }
        
        # Serialize to disk
        with open(filename, 'wb') as f:
            pickle.dump(snapshot_data, f, protocol=pickle.HIGHEST_PROTOCOL)
        
        return filename
    
    def restore_from_snapshot(self, filename):
        """Restore database from snapshot"""
        with open(filename, 'rb') as f:
            snapshot_data = pickle.load(f)
        
        self.database.data = snapshot_data['data']
        self.database.current_memory_usage = snapshot_data['metadata']['memory_usage']
        
        return snapshot_data['timestamp']

2. Write-Ahead Logging (WAL)

class WriteAheadLog:
    def __init__(self, log_filename="database.wal"):
        self.log_filename = log_filename
        self.log_file = open(log_filename, 'ab')
        self.log_sequence_number = 0
    
    def log_operation(self, operation, key, value=None):
        """Log operation to WAL before applying"""
        self.log_sequence_number += 1
        
        log_entry = {
            'lsn': self.log_sequence_number,
            'timestamp': time.time(),
            'operation': operation,
            'key': key,
            'value': value
        }
        
        # Serialize and write to log
        log_data = pickle.dumps(log_entry)
        length_prefix = struct.pack('I', len(log_data))
        
        self.log_file.write(length_prefix + log_data)
        self.log_file.flush()  # Force write to disk
        os.fsync(self.log_file.fileno())  # Force OS to write to disk
        
        return self.log_sequence_number
    
    def replay_log(self, database, from_lsn=0):
        """Replay WAL entries to restore database state"""
        with open(self.log_filename, 'rb') as log_file:
            while True:
                # Read length prefix
                length_data = log_file.read(4)
                if len(length_data) < 4:
                    break
                
                entry_length = struct.unpack('I', length_data)[0]
                entry_data = log_file.read(entry_length)
                
                if len(entry_data) < entry_length:
                    break
                
                entry = pickle.loads(entry_data)
                
                if entry['lsn'] <= from_lsn:
                    continue
                
                # Replay operation
                if entry['operation'] == 'PUT':
                    database.data[entry['key']] = entry['value']
                elif entry['operation'] == 'DELETE':
                    database.data.pop(entry['key'], None)

Performance Optimizations

1. NUMA-Aware Memory Allocation

import numa

class NUMAMemoryManager:
    def __init__(self):
        self.numa_nodes = numa.get_max_node() + 1
        self.node_allocators = {}
        
        for node in range(self.numa_nodes):
            self.node_allocators[node] = NodeAllocator(node)
    
    def allocate_on_node(self, size, preferred_node=0):
        """Allocate memory on specific NUMA node"""
        return self.node_allocators[preferred_node].allocate(size)
    
    def get_local_node(self):
        """Get NUMA node for current thread"""
        return numa.get_run_node_mask()

class NodeAllocator:
    def __init__(self, node_id):
        self.node_id = node_id
        self.allocated_blocks = []
    
    def allocate(self, size):
        """Allocate memory on this NUMA node"""
        # Use numa library to allocate on specific node
        memory_block = numa.alloc_onnode(size, self.node_id)
        self.allocated_blocks.append(memory_block)
        return memory_block

2. Lock-Free Data Structures

import threading
from threading import RLock

class LockFreeHashMap:
    def __init__(self, initial_capacity=16):
        self.capacity = initial_capacity
        self.buckets = [AtomicReference(None) for _ in range(initial_capacity)]
        self.size_counter = AtomicInteger(0)
    
    def put(self, key, value):
        """Lock-free put operation"""
        hash_code = hash(key)
        index = hash_code % self.capacity
        
        while True:
            current_head = self.buckets[index].get()
            new_node = HashNode(key, value, current_head)
            
            # Try to atomically update the bucket
            if self.buckets[index].compare_and_set(current_head, new_node):
                self.size_counter.increment()
                break
            
            # Retry if CAS failed (another thread modified the bucket)
    
    def get(self, key):
        """Lock-free get operation"""
        hash_code = hash(key)
        index = hash_code % self.capacity
        
        current = self.buckets[index].get()
        while current is not None:
            if current.key == key:
                return current.value
            current = current.next
        
        return None

class AtomicReference:
    def __init__(self, initial_value):
        self._value = initial_value
        self._lock = RLock()
    
    def get(self):
        with self._lock:
            return self._value
    
    def compare_and_set(self, expected, new_value):
        with self._lock:
            if self._value == expected:
                self._value = new_value
                return True
            return False

Real-World Examples

Redis Configuration

# Redis-like configuration
REDIS_CONFIG = {
    'maxmemory': '2gb',
    'maxmemory-policy': 'allkeys-lru',
    'save': '900 1',  # Snapshot every 15 minutes if at least 1 key changed
    'appendonly': 'yes',  # Enable AOF persistence
    'appendfsync': 'everysec'  # Fsync every second
}

SAP HANA Column Store

-- SAP HANA in-memory column store
CREATE COLUMN TABLE sales_data (
    id INTEGER PRIMARY KEY,
    product_id INTEGER,
    sale_date DATE,
    amount DECIMAL(10,2),
    customer_id INTEGER
) PARTITION BY HASH (customer_id) PARTITIONS 8;

-- Create in-memory index
CREATE INDEX idx_sales_date ON sales_data (sale_date) ASC;

Use Cases and Trade-offs

Best Use Cases:

  • Real-time analytics and reporting
  • Session stores and caching
  • Gaming leaderboards and real-time scoring
  • Financial trading systems
  • IoT data processing

Advantages:

  • Ultra-fast read/write performance (microsecond latency)
  • High throughput (millions of operations per second)
  • Complex analytics on large datasets
  • Simplified architecture (no disk I/O bottlenecks)

Disadvantages:

  • Limited by available RAM
  • Higher cost per GB compared to disk storage
  • Durability challenges (data loss on power failure)
  • Cold start time after restarts
  • Memory fragmentation over time

Memory vs Disk Performance:

  • Memory access: ~100 nanoseconds
  • SSD access: ~100 microseconds (1000x slower)
  • HDD access: ~10 milliseconds (100,000x slower)

In-memory databases provide exceptional performance for applications that can fit their working set in memory and have designed appropriate durability strategies for their use case.

Related Concepts

caching
durability
performance
redis
memcached

Used By

redissap-hanaoracle-timestenvoltdb