Deploying innovative AI models in different production environments becomes a common problem as AI applications become more ubiquitous in our daily lives. Deployment of both training and inference workloads bring great challenges as we start to support a combinatorial choice of models and environment. Additionally, real world applications bring with a multitude of goals, such as minimizing dependencies, broader model coverage, leveraging the emerging hardware primitives for performance, reducing memory footprint, and scaling to larger environments.

Solving these problems for training and inference involves a combination of ML programming abstractions, learning-driven search, compilation, and optimized library runtime. These themes form an emerging topic – machine learning compilation that contains active ongoing developments. In this tutorials sequence, we offer the first comprehensive treatment of its kind to study key elements in this emerging field systematically. We will learn the key abstractions to represent machine learning programs, automatic optimization techniques, and approaches to optimize dependency, memory, and performance in end-to-end machine learning deployment.

Audience and Prerequisites

This course aims to target audiences who are working on machine learning in the wild. ML in practice is a broad topic that involves collaborations among multiple audiences, including machine learning scientists, machine learning engineers, and hardware providers.

The course requires a minimum set of prerequisites in data science and machine learning.