Courses

Online Course

Machine Learning Compiler

July 2022

Tianqi Chen

https://book.mlc.ai

A comprehensive course on machine learning compilation, offering the first systematic treatment of key elements in this emerging field. Learn about ML programming abstractions, learning-driven search, compilation, and optimized library runtime. This course covers deploying innovative AI models across different production environments, addressing challenges in both training and inference workloads.

Key Topics Covered:

Introduction to ML Compilation
Tensor Program Abstraction
End to End Model Execution
Automatic Program Optimization
Integration with ML Frameworks
GPU and Hardware Acceleration
Computational Graph Optimization

Visit Course Website

ICML 2024 Tutorial

Towards Efficient Generative Large Language Model Serving: A Tutorial from Algorithms to Systems

July 2024

Xupeng Miao, Zhihao Jia

Vienna, Austria

This tutorial explores comprehensive strategies for efficient serving of generative large language models, covering both algorithmic approaches and system-level optimizations. We present a survey from algorithms to systems, discussing key techniques for improving inference performance, reducing latency, and optimizing resource utilization across diverse hardware platforms.

Key Topics Covered:

LLM serving architectures
Speculative inference techniques
Memory optimization strategies
Distributed serving systems
Hardware acceleration
Performance benchmarking

View Tutorial Page

SIGMOD 2024 Tutorial

Efficient Systems for Large Language Model Serving

June 2024

Xupeng Miao, Zhihao Jia, Bin Cui

Santiago, Chile

This tutorial at SIGMOD 2024 focuses on efficient systems for large language model serving, presenting state-of-the-art techniques for optimizing LLM inference in database and data management contexts. We cover system design principles, optimization strategies, and practical deployment considerations for serving LLMs at scale.

Key Topics Covered:

LLM serving system design
Query optimization for LLMs
Resource management
Scalability techniques
Integration with databases
Performance evaluation

View ACM Paper

ASPLOS 2025 Tutorial

GenAI Catalyst: Efficient Systems and Compilers for Generative AI

March 30, 2025

Xupeng Miao, Zhihao Jia

Rotterdam, Netherlands

This tutorial provides a comprehensive understanding of state-of-the-art techniques in the design and implementation of systems and compilers that optimize the performance of generative AI models, especially for large language models. We present Mirage, a multi-level superoptimization-based tensor program compiler, FlexFlow Serve for low-latency LLM serving, and FlexLLM for memory-efficient LLM finetuning.

Key Topics Covered:

Mirage: Multi-level superoptimization compiler
FlexFlow Serve: Distributed LLM serving
FlexLLM: Memory-efficient finetuning
Tensor program optimization
Speculative inference techniques
GPU kernel generation

Visit Tutorial Website

Related Resources

Machine Learning Compiler

Key Topics Covered:

Towards Efficient Generative Large Language Model Serving: A Tutorial from Algorithms to Systems

Key Topics Covered:

Efficient Systems for Large Language Model Serving

Key Topics Covered:

GenAI Catalyst: Efficient Systems and Compilers for Generative AI

Key Topics Covered:

Related Resources

Research Papers

Projects

Documentation