Online Course

Machine Learning Compiler

July 2022
Tianqi Chen

A comprehensive course on machine learning compilation, offering the first systematic treatment of key elements in this emerging field. Learn about ML programming abstractions, learning-driven search, compilation, and optimized library runtime. This course covers deploying innovative AI models across different production environments, addressing challenges in both training and inference workloads.

Key Topics Covered:

  • Introduction to ML Compilation
  • Tensor Program Abstraction
  • End to End Model Execution
  • Automatic Program Optimization
  • Integration with ML Frameworks
  • GPU and Hardware Acceleration
  • Computational Graph Optimization
ICML 2024 Tutorial

Towards Efficient Generative Large Language Model Serving: A Tutorial from Algorithms to Systems

July 2024
Xupeng Miao, Zhihao Jia
Vienna, Austria

This tutorial explores comprehensive strategies for efficient serving of generative large language models, covering both algorithmic approaches and system-level optimizations. We present a survey from algorithms to systems, discussing key techniques for improving inference performance, reducing latency, and optimizing resource utilization across diverse hardware platforms.

Key Topics Covered:

  • LLM serving architectures
  • Speculative inference techniques
  • Memory optimization strategies
  • Distributed serving systems
  • Hardware acceleration
  • Performance benchmarking
SIGMOD 2024 Tutorial

Efficient Systems for Large Language Model Serving

June 2024
Xupeng Miao, Zhihao Jia, Bin Cui
Santiago, Chile

This tutorial at SIGMOD 2024 focuses on efficient systems for large language model serving, presenting state-of-the-art techniques for optimizing LLM inference in database and data management contexts. We cover system design principles, optimization strategies, and practical deployment considerations for serving LLMs at scale.

Key Topics Covered:

  • LLM serving system design
  • Query optimization for LLMs
  • Resource management
  • Scalability techniques
  • Integration with databases
  • Performance evaluation
ASPLOS 2025 Tutorial

GenAI Catalyst: Efficient Systems and Compilers for Generative AI

March 30, 2025
Xupeng Miao, Zhihao Jia
Rotterdam, Netherlands

This tutorial provides a comprehensive understanding of state-of-the-art techniques in the design and implementation of systems and compilers that optimize the performance of generative AI models, especially for large language models. We present Mirage, a multi-level superoptimization-based tensor program compiler, FlexFlow Serve for low-latency LLM serving, and FlexLLM for memory-efficient LLM finetuning.

Key Topics Covered:

  • Mirage: Multi-level superoptimization compiler
  • FlexFlow Serve: Distributed LLM serving
  • FlexLLM: Memory-efficient finetuning
  • Tensor program optimization
  • Speculative inference techniques
  • GPU kernel generation