Quick Start¶

Examples¶

To begin with, try out MLC LLM support for int4-quantized Llama3 8B. It is recommended to have at least 6GB free VRAM to run it.

Install MLC LLM. MLC LLM is available via pip. It is always recommended to install it in an isolated conda virtual environment.

Run chat completion in Python. The following Python script showcases the Python API of MLC LLM:

from mlc_llm import MLCEngine

# Create engine
model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"
engine = MLCEngine(model)

# Run chat completion in OpenAI API.
for response in engine.chat.completions.create(
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    model=model,
    stream=True,
):
    for choice in response.choices:
        print(choice.delta.content, end="", flush=True)
print("\n")

engine.terminate()

Documentation and tutorial. Python API reference and its tutorials are available online.

https://raw.githubusercontent.com/mlc-ai/web-data/main/images/mlc-llm/tutorials/python-engine-api.jpg — MLC LLM Python API¶

Install MLC LLM. MLC LLM is available via pip. It is always recommended to install it in an isolated conda virtual environment.

Launch a REST server. Run the following command from command line to launch a REST server at http://127.0.0.1:8000.

mlc_llm serve HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC

Send requests to server. When the server is ready (showing INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)), open a new shell and send a request via the following command:

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
        "model": "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC",
        "messages": [
            {"role": "user", "content": "Hello! Our project is MLC LLM. What is the name of our project?"}
        ]
  }' \
  http://127.0.0.1:8000/v1/chat/completions

Documentation and tutorial. Check out REST API for the REST API reference and tutorial. Our REST API has complete OpenAI API support.

https://raw.githubusercontent.com/mlc-ai/web-data/main/images/mlc-llm/tutorials/python-serve-request.jpg — Send HTTP request to REST server in MLC LLM¶

Install MLC LLM. MLC LLM is available via pip. It is always recommended to install it in an isolated conda virtual environment.

For Windows/Linux users, make sure to have latest Vulkan driver installed.

Run in command line.

mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC

If you are using windows/linux/steamdeck and would like to use vulkan, we recommend installing necessary vulkan loader dependency via conda to avoid vulkan not found issues.

conda install -c conda-forge gcc libvulkan-loader

What to Do Next¶

Check out Introduction to MLC LLM for the introduction of a complete workflow in MLC LLM.
Depending on your use case, check out our API documentation and tutorial pages:
Convert model weight to MLC format, if you want to run your own models.
Compile model libraries, if you want to deploy to web/iOS/Android or control the model optimizations.
Report any problem or ask any question: open new issues in our GitHub repo.