Lecture materials will be posted here
Slides, readings, demos, and lab links will be added to the Materials tab as they become available.
Summer 2026
Scalable data systems, parallel analytics, cloud infrastructure, and modern AI systems/infrastructure
Latest Announcement
Slides, readings, demos, and lab links will be added to the Materials tab as they become available.
Overview
Welcome to the course of Big Data Systems. Scalable big data systems are a central part of modern data science. This course will cover topics including design and use of parallel dataflow systems (MapReduce/Hadoop and Spark), scalable and parallel Python analytics frameworks, machine learning systems (Ray), and cloud data systems (cloud storage, large ML infrastructure). A major component of this course is hands-on programming using scalable analytics tools and cloud resources on Amazon Web Services (AWS) or Google Cloud.
A major component is hands-on programming using scalable analytics tools and cloud resources on AWS or Google Cloud.
Syllabus
Calendar
| Date | Topic | Materials | Notes |
|---|
Lecture Materials
Assignments
Staff