Yue Cheng

mrz7dp _AT_ virginia.edu

AI Systems Researcher

I am an Associate Professor of Data Science and Computer Science at the University of Virginia. My research covers a range of topics including distributed systems, serverless and cloud computing, storage systems, operating systems, and high-performance computing. My current research focuses on designing scalable, high-performance, and easy-to-use computer systems that manage and process huge volume of data.

Currently I am working on: (1) Storage for AI: rethinking storage system designs and data reduction techniques for AI applications. (2) Serverless + AI: making AI applications (LLM serving, emerging multimodal apps) fundamentally elastic; (3) Serverless and FaaS: improving serverless computing using a end-to-end approach that cuts across the entire ecosystem stack: applications, frameworks, platforms, and OS;

I am the recipient of an NSF CAREER Award (2021), an Amazon Research Award (2021), a Meta Research Award (2022), the IEEE CS TCHPC Early Career Researchers Award for Excellence in HPC (2022), and a Samsung GRO 2023 Award (2023). I was ranked among the Stanford’s World’s Top 2% Scientists in 2024. I received my Ph.D. degree in Computer Science from Virginia Tech. During my Ph.D. I spent two summers at IBM Research Almaden in 2013 and 2014, and six months at Dell EMC Princeton Office in 2015, all on storage systems.

selected projects

Most of my projects are open-source and available on our group’s GitHub page.

Our recent focus is on:

Designing first-gen Serverless AI platforms for Large Language Model (LLM) applications,
Rethinking storage system design in the era of Generative AI and LLMs.

I’m looking for motivated graduate/undergrad interns interested in conducting research in cutting-edge LLM systems areas (serverless AI, LLM agents, storage for ML/AI models/datasets). Please fill out this form if you are interested! Also feel free to reach out via email.

For our most recent projects, check our latest preprints and publication.

Serverless AI: Interactive ML/AI workloads require elastic access to heterogeneous compute resources (GPU, CPU). We explore new serverless execution paradigms to enable efficient GPU utilization and scalable and elastic GPU management for LLM fine-tuning and inference. NotebookOS implements on-demand GPUs for Jupyter Notebook-based interactive training workloads. ZenFlow accelerates LLM fine-tuning by prioritizing and decoupling parameter updates across fast GPU and slow CPU, minimizing GPU stalls while preserving accuracy. ZenFlow had been adopted into DeepSpeed. MorphServe enables flexible and elastic GPU memory scaling for bursty LLM inference workloads via dynamic model layer quantization and KVC resizing.
Storage Systems for AI: We are rethinking storage system design to sustain the exponential AI data explosion. ZipLLM and BitX are new lossless compressing algorithms that reduce the LLM storage footprint by 50%. ELF and ELVES near-losslessly compress ML models to achieve effective model storage reduction. SHADE and FedCaSe automatically and intelligently cache the most important training samples without losing training quality.
- ZipLLM [NSDI’26]
- ELF [VLDB’24]: [GitHub]
- SoCC’24: [GitHub]
- FAST’23: [GitHub]
FaaS Platform Management: We design innovative systems solutions to make FaaS truly elastic. A highly scalable container provisioning framework that can provision thousands of 10+GB serverless function containers with just a few seconds. FaaSNet [ATC’21] and CIDRE [ASPLOS’25] are both deployed at Alibaba Function Compute.
- CIDRE [ASPLOS’25]: [GitHub]
- FaaSNet [ATC’21]: [GitHub] [Alibaba Cloud Blog]
Serverless Cloud Storage: Storing large and small objects on a dynamic fleet of serverless functions with only 3% of ElastiCache’s cost but without sacrificing performance and availability.
- λFS [ASPLOS’23]: [GitHub]
- InfiniStore [VLDB’23]: [GitHub]
- InfiniCache [FAST’20]: [GitHub]
Serverless Parallel Computing: Scaling out Python parallel programs (e.g., Dask applications) on FaaS without worrying about tedious cluster management. Wukong uses a new decentralized scheduling technique, which decentralizes resource orchestration to each individual serverless function, thereby enabling high elasticity and high scalability.
- SoCC’20: [GitHub]
- PDSW’19
Serverless Function OS Scheduling: Linux CFS is not ideal for short-lived serverless function workloads. This project rethinks OS scheduling to minimize function turnaround time.
- ALPS [ATC’24]: [GitHub]
- SFS [SC’22]: [GitHub]

news

Feb 2026	We have released BitX, a high-performance, lossless compressor as a for safetensors model files (paper, Rust crate). BitX achieves 10+ GB/s compression throughput, making it practical for large-scale model storage and transfer. We’ve also open-sourced ZipLLM, our model-aware data reduction pipeline. Access the project website to learn more about the project. Check them out and share your feedback!
Jan 2026	🎉 Two papers accepted to MLSys’26. Congrats to Zhaoyuan and Rui!
Dec 2025	Thrilled to receive a CCI award on developing secure agentic AI serving infrastructure (w/ Songqing Chen from GMU). Thanks, CCI!
Aug 2025	🚀 ZenFlow has now been officially adopted into DeepSpeed and is also featured on the PyTorch blog.
Aug 2025	🎉 Congrats to Ruizhe and Yuqi on the acceptance of their paper to IMC 2025!
Aug 2025	📢 New Course Alert: I’ll be teaching CS6501 Serverless AI in Fall 25! This course features a hands-on project powered by AI coding assistance.
Jul 2025	🎓 Ben Carver successfully defended his Ph.D. dissertation. Congratulations, Dr. Carver!! Ben will be joining the AI Networking Infrastructure team @ Meta (NYC) as a Research Scientist, where he’ll work on cutting-edge infrastructure to support next-gen AI.
Jul 2025	🎉 Congrats to Zirui, Tingfeng, and Zhaoyuan on the acceptance of ZipLLM to NSDI 2026! We analyzed all publicly available LLM repos on Hugging Face and built effective, efficient data reduction algorithms tailored for massive-scale LLM storage. Stay tuned for more updates!
Jun 2025	🎉 Congrats to Ben and Jingyuan on the acceptance of NotebookOS to ASPLOS 2026! In this work, we built a GPU-efficient distributed Notebook platform that enables on-demand GPU allocation for interactive training workloads such as LLM fine-tuning. Stay tuned for more updates on the project!
Jan 2025	🎉 Congrats to Qichang on the acceptance of CIDRE to ASPLOS 2025! This paper systematically studies the challenges of concurrent serverless function invocations and presents a novel function container orchestration algorithm that speculatively chooses between a delayed warm start and a cold start.
Jan 2025	Congrats to Ruizhe on the IPFS data management work accepted to WWW 2025!
Sep 2024	Congrats to Redwan on FedCaSe on federated learning I/O caching and scheduling accepted to SoCC 2024!
Sep 2024	👋 A warm welcome to our newest members: Zirui Wang and Tingfeng Lan!
Sep 2024	Thrilled to receive an NSF CSSI Elements grant on developing a sustainable and GPU-efficient cyberinfrastructure for Notebooks (w/ Co-PI Geoffrey Fox). Thanks, NSF!
Jul 2024	Excited to receive an NSF REU Site grant (lead PI: Claudia Scholz). Thanks, NSF!
Jun 2024	Congrats to Yuqi and Ruizhe on ALPS accepted to USENIX ATC 2024! ALPS learns workload intelligence from the user space to inform serverless function scheduling in the kernel space.
May 2024	This summer Yuqi will be doing a student researcher internship at Google and Zhaoyuan will be doing a research internship at Samsung. Congrats!
Apr 2024	Excited to receive an NSF OAC Core grant on building a distributed graph learning cyberinfrastructure for large spatiotemporal prediction (w/ Liang Zhao from Emory). Thanks, NSF!
Mar 2024	Congrats to Ruizhe on the IPFS analysis work accepted to SIGMETRICS 2024! We answered questions about accessibility, content, and performance of IPFS in this research.
Mar 2024	Congrats to Zhaoyuan and Zirui on their work accepted to VLDB 2024! In this work, Zhaoyuan analyzed a large dataset of real-world pre-trained ML models collected from Hugging Face. Based on the analysis study, he designed a new storage compression method for reducing the storage requirement of pre-trained models at scale.
Feb 2024	Congrats to Rui on his work accepted to VLDB 2024! In this work, Rui systematically studied the algorithmic complexity vulnerabilities of dynamic learned indexes.
Jan 2024	Check our latest survey on resource-efficient LLMs.
Oct 2023	Excited to receive a Samsung GRO 2023 Award on New Storage for Large ML Training (w/ Ali Anwar from UMN). Thanks, Samsung Advanced Institute of Technology and Samsung Memory Solutions Lab, for the generous support on our research!
Oct 2023	Serving as the general co-chair of ACM HotStorage’24. Consider submitting your exciting early ideas!
Jun 2023	🎓 My first Ph.D. student Jingyuan Zhang successfully defended his Ph.D. dissertation. Congratulations, Dr. Zhang! Jingyuan will be joining the cloud-native infrastructure team @ ByteDance (San Jose, CA).
Apr 2023	Congrats to Ben, Runzhou, and Jingyuan on the acceptance of λFS to ASPLOS 2023! The acceptance of λFS at ASPLOS’23 marks yet another significant milestone of our serverless storage project series. Don’t forget to check out our projects: Episode I - InfiniCache, Episode II - InfiniStore, and our latest work, Episode III - λFS.
Feb 2023	Congrats to Jingyuan, Ben, and the team on the acceptance of InfiniStore to VLDB 2023!
Dec 2022	Congrats to Redwan, Ahmad, and Yuqi on their paper on deep learning I/O caching accepted to FAST 2023!
Sep 2022	I am honored to be selected for the 2022 IEEE CS TCHPC Early Career Researchers Award for Excellence in High Performance Computing.
Sep 2022	Congrats to Zhaoyuan on his paper accepted to DRBSD-8 co-located with SC 2022!
Sep 2022	Excited to receive a Meta Research Award for AI System Hardware/Software Codesign. Thanks, Meta Research!
Aug 2022	In Fall ‘22, I am joining the School of Data Science and the Department of Computer Science at the University of Virginia.
Jul 2022	SFS is nominated as a Best Student Paper Award Finalist at SC 2022! Congrats to Yuqi!
Jun 2022	Congrats to Yuqi on his paper on serverless function scheduling accepted to SC 2022!
May 2022	This summer my students will intern at MSR (Ben Carver), ByteDance (Yuqi Fu, Jingyuan Zhang), and Argonne National Lab (Zhaoyuan Su)! Congrats!
May 2022	🏆 Thrilled to receive an Outstanding Teaching Award from CS @ Mason!
Aug 2021	Congrats to Li and Haoliang on rKube accepted to SoCC 2021!
Aug 2021	A collaborative FMSG grant funded by NSF (with Jia Liu @ Auburn). Thanks, NSF!
Jun 2021	Congrats to Zheng on FedAT accepted to SC 2021!
Apr 2021	Congrats to Ao on FaaSNet accepted to USENIX ATC 2021!
Mar 2021	Honored to receive a gift from Adobe Research for our work on serverless computing! Thanks, Adobe!
Feb 2021	Thrilled to receive an NSF CAREER Award for my work on building serverless cloud storage infrastructure. Thanks, NSF!
Oct 2020	Excited to receive an Amazon Research Award with Liang Zhao from Emory!
Aug 2020	Congrats to Junxiang and Zheng on their paper getting accepted to IEEE ICDM 2020!
Aug 2020	Congrats to Ben, Jingyuan, and Ao on Wukong getting accepted by ACM SoCC 2020! Wukong is a super-fast serverless parallel computing framework built atop AWS Lambda. Wukong achieves up to 68X speedup over state-of-the-art serverless parallel processing frameworks. Wukong project is online. We are happy to accept contributions!
Jul 2020	Two projects got funded by NSF. With the new MRI grant, we will be building a new HPC infrastructure to support the growing computing needs for Mason users. With an OAC grant, we will be building a new model parallel deep learning training infrastructure. Thanks NSF!
Mar 2020	Congrats to Zheng, Ahsan, and Syed on TiFL getting accepted to ACM HPDC 2020!
Dec 2019	Congrats to Ao, Jingyuan, and Xiaolong on InfiniCache getting accepted to USENIX FAST 2020! InfiniCache is a first-of-its-kind, cost-effective, object cache that is built atop ephemeral cloud funtions. InfiniCache is 31-96x cheaper than existing cloud cache services (e.g., AWS ElastiCache) while offering same or better performance. Fork InfiniCache on GitHub.

selected/recent publications

Preprint

ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates

Tingfeng Lan, Yusen Wu, Bin Ma, Zhaoyuan Su, Rui Yang, Tekin Bicer, Masahiro Tanaka, Olatunji Ruwase, Dong Li, and Yue Cheng

In Preprints

arXiv
MLSys’26

λScale: Enabling Fast Scaling for Serverless Large Language Model Inference

Minchen Yu, Rui Yang, Chaobo Jia, Zhaoyuan Su, Sheng Yao, Tingfeng Lan, Yuchen Yang, Yue Cheng, Wei Wang, Ao Wang, and Ruichuan Chen

In Ninth Annual Conference on Machine Learning and Systems (to appear) 2026

arXiv
MLSys’26

MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing

Zhaoyuan Su, Zeyu Zhang, Tingfeng Lan, Zirui Wang, Haiying Shen, Juncheng Yang, and Yue Cheng

In Ninth Annual Conference on Machine Learning and Systems (to appear) 2026

arXiv
NSDI’26

ZipLLM: Efficient LLM Storage via Model-Aware Synergistic Data Deduplication and Compression

Zirui Wang, Tingfeng Lan, Zhaoyuan Su, Juncheng Yang, and Yue Cheng

In 23rd USENIX Symposium on Networked Systems Design and Implementation (to appear) 2026

arXiv
ASPLOS’26

NotebookOS: A Replicated Notebook Platform for Interactive Training with On-Demand GPUs

Benjamin Carver, Jingyuan Zhang, Haoliang Wang, Kanak Mahadik, and Yue Cheng

In 31th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (to appear) 2026

arXiv Code
IMC’25

The Decentralization Dilemma: Performance Trade-Offs in IPFS and Breakpoints

Ruizhe Shi, Yuqi Fu, Ruizhi Cheng, Bo Han, Yue Cheng, and Songqing Chen

In The ACM Internet Measurement Conference 2025 2025

PDF
ASPLOS’25

Concurrency-Informed Orchestration for Serverless Functions

Qichang Liu, Yue Cheng, Haiying Shen, Ao Wang, and Bharathan Balaji

In 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2025

PDF Code
WWW’25

Centralization in Decentralized Web: Challenges and Opportunities in IPFS Data Management

Ruizhe Shi, Ruizhi Cheng, Yuqi Fu, Bo Han, Yue Cheng, and Songqing Chen

In The 2025 ACM Web Conference 2025

PDF
SoCC’24

FedCaSe: Enhancing Federated Learning with Heterogeneity-aware Caching and Scheduling

Redwan Ibne Seraj Khan, Arnab K. Paul, Yue Cheng, Xun Jian, and Ali R. Butt

In Proceedings of the ACM Symposium on Cloud Computing 2024

PDF Code
VLDB’24

Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

Zhaoyuan Su, Ammar Ahmed, Zirui Wang, Ali Anwar, and Yue Cheng

In 50th International Conference on Very Large Data Bases 2024

arXiv PDF Code
VLDB’24

Algorithmic Complexity Attacks on Dynamic Learned Indexes

Rui Yang, Evgenios M. Kornaropoulos, and Yue Cheng

In 50th International Conference on Very Large Data Bases 2024

arXiv PDF Code
USENIX ATC’24

ALPS: An Adaptive Learning, Priority OS Scheduler for Serverless Functions

Yuqi Fu, Ruizhe Shi, Haoliang Wang, Songqing Chen, and Yue Cheng

In 2024 USENIX Annual Technical Conference (USENIX ATC 24) 2024

PDF Code Talk
SIGMETRICS’24

A Closer Look into IPFS: Accessibility, Content, and Performance

Ruizhe Shi, Ruizhi Cheng, Bo Han, Yue Cheng, and Songqing Chen

In ACM SIGMETRICS / IFIP Performance 2024

PDF
ASPLOS’23

λFS: A Scalable and Elastic Distributed File System Metadata Service using Serverless Functions

Benjamin Carver, Runzhou Han, Jingyuan Zhang, Mai Zheng, and Yue Cheng

In 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2023

arXiv PDF Code
VLDB’23

InfiniStore: Elastic Serverless Cloud Storage

Jingyuan Zhang, Ao Wang, Xiaolong Ma, Benjamin Carver, Nicholas John Newman, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan, and Yue Cheng

In 49th International Conference on Very Large Data Bases 2023

arXiv PDF Code
USENIX FAST’23

SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training

Redwan Ibne Seraj Khan, Ahmad Hossein Yazdani, Yuqi Fu, Arnab K. Paul, Bo Ji, Xun Jian, Yue Cheng, and Ali R. Butt

In 21th USENIX Conference on File and Storage Technologies (FAST 23) 2023

PDF Code Talk
SC’22

SFS: Smart OS Scheduling for Serverless Functions

Yuqi Fu, Li Liu, Haoliang Wang, Yue Cheng, and Songqing Chen

In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2022

Best Student Paper Award Finalist
arXiv PDF Code
USENIX ATC’21

FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute

Ao Wang, Shuai Chang, Huangshi Tian, Hongqi Wang, Haoran Yang, Huiba Li, Rui Du, and Yue Cheng

In 2021 USENIX Annual Technical Conference (USENIX ATC 21) 2021

PDF Code Talk
SoCC’20

Wukong: A Scalable and Locality-Enhanced Framework for Serverless Parallel Computing

Benjamin Carver, Jingyuan Zhang, Ao Wang, Ali Anwar, Panruo Wu, and Yue Cheng

In Proceedings of the 11th ACM Symposium on Cloud Computing 2020

Abstract arXiv PDF Code Talk

Executing complex, burst-parallel, directed acyclic graph (DAG) jobs poses a major challenge for serverless execution frameworks, which will need to rapidly scale and schedule tasks at high throughput, while minimizing data movement across tasks. We demonstrate that, for serverless parallel computations, decentralized scheduling enables scheduling to be distributed across Lambda executors that can schedule tasks in parallel, and brings multiple benefits, including enhanced data locality, reduced network I/Os, automatic resource elasticity, and improved cost effectiveness. We describe the implementation and deployment of our new serverless parallel framework, called Wukong, on AWS Lambda. We show that Wukong achieves near-ideal scalability, executes parallel computation jobs up to 68.17X faster, reduces network I/O by multiple orders of magnitude, and achieves 92.96% tenant-side cost savings compared to numpywren.
USENIX FAST’20

InfiniCache: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache

Ao Wang, Jingyuan Zhang, Xiaolong Ma, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan, and Yue Cheng

In 18th USENIX Conference on File and Storage Technologies (FAST 20) 2020

PDF Talk Website Press Blog