Dataflow Engines
Apache Spark, Flink batch, and modern dataflow architectures
Distributed Join Algorithms
Sort-merge, hash, and broadcast joins in distributed systems
ETL vs ELT
Extract-Transform-Load vs Extract-Load-Transform patterns
MapReduce Fundamentals
Understanding the map-reduce programming model for big data