Last updated: | Permalink
Lecture 4: Apache Spark and RDDs
Learning objectives:
In this lecture, you will learn:
- the in-memory cluster computing abstraction RDDs and how it’s different from other in-memory data structures.
 - the motivation, design, and architecture of Apache Spark.
 - how Spark is different from MapReduce.
 - the APIs of Spark and the workflows of basic Spark applications (e.g., log debugging, PageRank).
 
Lecture slides
- Lec4: Spark: slides pdf, slides+notes, pipeline+pagerank notes
 
Readings
- Lec4: The Spark paper
 - Optional reading: PageRank algorithm (§2.1)