Machine Learning Compilation¶
Deploying innovative AI models in different production environments becomes a common problem as AI applications become more ubiquitous in our daily lives. Deployment of both training and inference workloads bring great challenges as we start to support a combinatorial choice of models and environment. Additionally, real world applications bring with a multitude of goals, such as minimizing dependencies, broader model coverage, leveraging the emerging hardware primitives for performance, reducing memory footprint, and scaling to larger environments.
Solving these problems for training and inference involves a combination of ML programming abstractions, learning-driven search, compilation, and optimized library runtime. These themes form an emerging topic – machine learning compilation that contains active ongoing developments. In this tutorials sequence, we offer the first comprehensive treatment of its kind to study key elements in this emerging field systematically. We will learn the key abstractions to represent machine learning programs, automatic optimization techniques, and approaches to optimize dependency, memory, and performance in end-to-end machine learning deployment.
This material serves as the reference for MLC course, we will populate notes and tutorials here as course progresses.
- 1. Introduction
- 2. Tensor Program Abstraction
- 3. End to End Model Execution
- 3.1. Prelude
- 3.2. Preparations
- 3.3. End to End Model Integration
- 3.4. Constructing an End to End IRModule in TVMScript
- 3.5. Build and Run the Model
- 3.6. Integrate Existing Libraries in the Environment
- 3.7. Mixing TensorIR Code and Libraries
- 3.8. Bind Parameters to IRModule
- 3.9. Discussions
- 3.10. Summary
- 4. Automatic Program Optimization
- 5. Integration with Machine Learning Frameworks
- 6. GPU and Hardware Acceleration
- 7. Computational Graph Optimization