Course Schedule
Being less concrete further out, the course scheduling is tentative and subject to changes.
Introduction
Function-as-a-Service platforms & workloads
Week 3
- 09/08
- Peeking Behind the Curtains of Serverless Platforms
- 09/10
Firecracker: Lightweight Virtualization for Serverless Applications
Week 4
Cold starts
- 09/17
Catalyzer: Sub-millisecond Startup for Serverless Computing with Initialization-less Booting
Benchmarking, Analysis, and Optimization of Serverless Function Snapshots
Project Proposal due
Week 5
Stateful serverless computing
Week 6
Serverless parallel computing & programming
Week 7
- 10/06
Occupy the Cloud: Distributed Computing for the 99%
From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers
- 10/08
Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure
Wukong: a scalable and locality-enhanced framework for serverless parallel computing
Serverless applications
Week 8
- 10/13
Reading day (no class)
- 10/15
Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads
Week 9
Serverless storage
- 10/22
Pocket: Elastic Ephemeral Storage for Serverless Analytics
InfiniCache: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache
Boki: Stateful Serverless Computing with Shared Logs (optional)
Splinter: Bare-Metal Extensions for Multi-Tenant Low-Latency Storage (optional)
Project Checkpoint due
LLM serving
Week 10
- 10/27
Efficient Memory Management for Large Language Model Serving with PagedAttention
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
The Ultra-Scale Playbook: Training LLMs on GPU Clusters (optional but highly recommended)
- 10/29
Orca: A Distributed Serving System for Transformer-Based Generative Models
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Week 11
- 11/03
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
- 11/05
SpotServe: Serving Generative Large Language Models on Preemptible Instances
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up (optional)
Serverless AI
Week 12
- 11/10
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
- 11/12
Towards Swift Serverless LLM Cold Starts with ParaServe
Week 13
- 11/17
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
SLoRA: Scalable Serving of Thousands of LoRA Adapters
Punica: Multi-Tenant LoRA Serving (optional)
- 11/19
BlitzScale: Fast and Live Large Model Autoscaling with O(1) Host Caching
PhoenixOS: Concurrent OS-level GPU Checkpoint and Restore with Validated Speculation
Week 14
- 11/24
Hack day (no class)
- 11/26
Thanksgiving recess (no class)
Wrapping up
Week 15
- 12/01
Project presentation I
- 12/03
Project presentation II
Week 16
- 12/08
Project presentation III
- 12/10
Project everything due