Welcome to MLC-LLM!¶
🚧 This doc is under heavy construction.
MLC LLM is the universal deployment solution that allows LLMs to run locally with native hardware acceleration on consumer devices.
Navigate by Topics¶
MLC LLM offers a set of pre-compiled models (Off-the-Shelf Models), as well as Python scripts that enable developers to define and compile models using either existing architectures or create new ones with customized weights.
An MLC-compiled model is composed of two elements: weights and a library of CPU/GPU compute kernels. For easy integration and redistribution, MLC further offers platform-specific lightweight runtimes, with a user-friendly interface to interact with the compiled models.
mlc_chat_cli
is the CLI app provided to load the model weights and compute kernels.
Demo:
Install or build the CLI app: Install MLC-LLM CLI.
Run compiled models via the CLI app: Run the Models Through CLI.
MLC compiles a model to WebGPU and WebAssembly, which can be executed by MLC LLM JavaScript runtime.
Demo:
Set up WebLLM: Web-LLM project.
Use MLC-compiled models in your own JavaScript project: Web-LLM NPM Package.
A model can be compiled to static system libraries and further linked to an iOS app. An example iOS app is provided with a clear structure that iOS developers could refer to ship LLMs in iOS.
Demo:
Set up iOS: iOS.
A model can be compiled to static system libraries and further linked to an Android app. An example Android app is provided with a clear structure that Android developers could refer to ship LLMs in Android.
Demo:
Set up Android: Android
MLC LLM is a Python package that uses TVM Unity to compile LLMs for universal deployment.
Install TVM Unity: Installation Guidelines.
Compile models: How to Compile Models.
Contribute new models: Contribute New Models to MLC-LLM.
Install TVM Unity: Installation Guidelines.
Define new model architectures: Add New Model Architectures.
Contribute new models: Contribute New Models to MLC-LLM.
All tutorials
Contribute to MLC-LLM
Model Zoo