Distributed Systems

Agentic AI Systems: Reliable LLM Agents with Tools, Memory, and Guardrails

Comprehensive guide to Agentic AI Systems: Designing Reliable LLM Agent Architectures with Tool Use, Memory, and Guardrails for software engineers

Agentic AI Systems in the Cloud: LLM Workflows with Tools, Memory & Guardrails

Comprehensive guide to Agentic AI Systems: Designing LLM-Powered Workflows with Tool Use, Memory, and Guardrails for software engineers

Essential Distributed Systems Patterns for Modern Applications

Explore key patterns like circuit breakers, bulkheads, and saga orchestration that make distributed systems resilient and scalable.

Distributed Cache

Learn how to design a highly available, distributed caching system with consistent hashing, replication, and eviction policies.

Medium

URL Shortener

Learn how to design a scalable URL shortening service with high availability, low latency, and analytics capabilities.

Medium

Achieving Data Consistency in Microservices Architecture

Strategies and patterns for managing data consistency across distributed microservices, including event sourcing and SAGA patterns.

Apache Flink

Distributed stream processing framework for real-time analytics, event-driven applications, and complex event processing

Apache Kafka

Distributed streaming platform designed for high-throughput, real-time data pipelines and event-driven architectures

Apache Storm

Real-time computation system for processing unbounded streams of data with guaranteed message processing

Bigtable

Google's Distributed Storage System for Structured Data.

Byzantine Fault Tolerance

Understanding consensus algorithms that can tolerate Byzantine (malicious) failures in distributed systems

40-50 minutes

advanced

CAP Theorem

Understanding the fundamental trade-offs in distributed systems design and implementation

25-35 minutes

intermediate

Circuit Breaker Pattern

Understanding fault tolerance and failure handling in distributed systems

20-30 minutes

intermediate

Consistent Hashing

Understanding data distribution and load balancing in distributed systems

25-35 minutes

intermediate

CQRS (Command Query Responsibility Segregation)

Separating read and write operations for scalable and maintainable distributed systems

30-40 minutes

advanced

Distributed Locks

Coordinating access to shared resources across distributed systems

25-35 minutes

intermediate

Dynamo

Amazon's distributed key-value storage system designed for high availability and eventual consistency.

Elasticsearch

Distributed search and analytics engine built on Apache Lucene for real-time search, logging, and data analytics

Exponential Backoff

Retry strategy that progressively increases delay between retry attempts to handle transient failures and prevent system overload

25-35 minutes

intermediate

Google File System

Google's scalable distributed file system designed for large distributed data-intensive applications.

Gossip Protocols

Understanding epidemic-style information dissemination protocols for scalable and fault-tolerant distributed systems

30-40 minutes

intermediate

Horizontal Scaling

Scaling strategy that increases system capacity by adding more machines or instances rather than upgrading existing hardware

35-50 minutes

intermediate

Idempotency

Ensuring operations can be safely retried without unintended side effects

20-30 minutes

intermediate

Kafka

LinkedIn's distributed streaming platform designed for high-throughput, low-latency data streaming.

Leader Election

Understanding how distributed systems select a leader to coordinate operations and maintain consistency

30-40 minutes

intermediate

MapReduce

Google's simplified data processing on large clusters.

Optimistic Locking

Concurrency control mechanism that assumes conflicts are rare and handles them when they occur, using version numbers or timestamps to detect concurrent modifications

30-45 minutes

intermediate

Pub/Sub Pattern

Messaging pattern for decoupled communication between distributed systems

intermediate

Quorum Systems

Understanding quorum-based consensus mechanisms for ensuring consistency and availability in distributed systems

25-35 minutes

intermediate

RAFT

Raft is a consensus algorithm for managing a replicated log that is easier to understand than Paxos while providing equivalent functionality and efficiency.

Secondary Index Partitioning

Global vs local secondary indexes in distributed systems

25-30 minutes

advanced

Split-Brain Prevention

Understanding techniques to prevent split-brain scenarios in distributed systems where multiple nodes believe they are the leader

25-35 minutes

intermediate

Top Apache Projects

Explore the most influential Apache Software Foundation projects used in modern distributed systems and data processing