Skip to main content Link Search Menu Expand Document (external link)
Last updated: | Permalink

Lecture 14: Scaling LLM serving

Learning objectives:

In this lecture, you will:

  • understand the challenges of serverless LLM model serving and inference
  • learn basics of the binomial tree and binomial pipeline algorithms and dynamic pipeline parallel inference.

Lecture slides

Recordings


© 2025 Yue Cheng. Released under the CC BY-SA license