Lecture 14: Scaling LLM serving | DS5110, Spring’25

Last updated: | Permalink

Lecture 14: Scaling LLM serving

Learning objectives:

In this lecture, you will:

understand the challenges of serverless LLM model serving and inference
learn basics of the binomial tree and binomial pipeline algorithms and dynamic pipeline parallel inference.

Lecture slides

Lec14: Scaling LLM serving: slides pdf

Recordings

Lec14: video