Projects
Explore our comprehensive suite of tools and libraries for high-performance machine learning compilations across diverse hardware platforms.
Core Projects
MLC LLM
High-performance, memory-efficient LLM inference across devices and backends. Built with advanced compilation and runtime optimizations for CPUs, GPUs, and mobile.
pip install mlc-llm
# Load and run a model
import mlc_llm
model = mlc_llm.load("Llama-2-7b-chat")
response = model.generate("Hello, world!")
WebLLM
In-browser LLM inference on WebGPU with zero server dependency. Ship private, fast AI experiences that run entirely client-side.
FlexFlow Serve
Low-latency, high-performance LLM serving built on speculative inference. Tree-based speculative decoding and token tree verification to significantly reduce end-to-end latency while preserving model quality.
Mirage
Automated kernel and graph optimization for LLM workloads. Explore schedule search and code generation for maximum performance.
XGrammar
Constrained decoding with expressive grammars for structured generation. Produce JSON, SQL, and domain-specific formats reliably.
{
"type": "object",
"properties": { "name": { "type": "string" } },
"required": ["name"]
}