Last updated: | Permalink
Lecture 4: Apache Spark and RDDs
Learning objectives:
In this lecture, you will learn:
- the in-memory cluster computing abstraction RDDs and how it’s different from other in-memory data structures.
- the motivation, design, and architecture of Apache Spark.
- how Spark is different from MapReduce.
- the APIs of Spark and the workflows of basic Spark applications (e.g., log debugging, PageRank).
Lecture slides
- Lec4: Spark: slides pdf, slides+notes, pipeline+pagerank notes
Readings
- Lec4: The Spark paper
- Optional reading: PageRank algorithm (§2.1)