Last updated: | Permalink
Lecture 14: Scaling LLM serving
Learning objectives:
In this lecture, you will:
- understand the challenges of serverless LLM model serving and inference
- learn basics of the binomial tree and binomial pipeline algorithms and dynamic pipeline parallel inference.
Lecture slides
- Lec14: Scaling LLM serving: slides pdf
Recordings
- Lec14: video