Reading List
Being less concrete further out, the reading list is being incrementally updated to include more papers as we go.
Big Data Systems
(required) The Google File System [ACM SOSP 2003]
(required) MapReduce: Simplified Data Processing on Large Clusters [USENIX OSDI 2004]
(required) Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [USENIX NSDI 2012]
Systems for ML and ML for Systems
Ray v2 Architecture (Ray white paper)
Ray: A Distributed Framework for Emerging AI Applications [USENIX OSDI 2018]
(required) The Case for Learned Index Structures [ACM SIGMOD 2018]
Cloud Computing and Storage Systems
(required) Cloud Programming Simplified: A Berkeley View on Serverless Computing [Tech report]
(required) Occupy the Cloud: Distributed Computing for the 99% [ACM SoCC 2017]
(required) Building and operating a pretty big storage system called S3 [Guest post from Andy Warfield, VP and distinguished engineer over AWS S3]
(required) Dynamo: Amazon’s Highly Available Key-value Store [ACM SOSP 2007]