Install TVM Unity Compiler

TVM Unity, the latest development in Apache TVM, is required to build MLC LLM. Its features include:

  • High-performance CPU/GPU code generation instantly without tuning;

  • Dynamic shape and symbolic shape tracking by design;

  • Supporting both inference and training;

  • Productive python-first compiler implementation. As a concrete example, MLC LLM compilation is implemented in pure python using its API.

TVM Unity can be installed directly from a prebuilt developer package, or built from source.

Option 1. Prebuilt Package

A nightly prebuilt Python package of Apache TVM Unity is provided.

Note

❗ Whenever using Python, it is highly recommended to use conda to manage an isolated Python environment to avoid missing dependencies, incompatible versions, and package conflicts.

conda activate your-environment
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly

Note

If encountering issues with GLIBC not found, please install the latest glibc in conda:

conda install -c conda-forge libgcc-ng

Option 2. Build from Source

While it is generally recommended to always use the prebuilt TVM Unity, if you require more customization, you may need to build it from source. NOTE. this should only be attempted if you are familiar with the intricacies of C++, CMake, LLVM, Python, and other related systems.

Details

Step 1. Set up build dependency. To build from source, you need to ensure that the following build dependencies are met:

  • CMake >= 3.24

  • LLVM >= 15

  • Git

  • (Optional) CUDA >= 11.8 (targeting NVIDIA GPUs)

  • (Optional) Metal (targeting Apple GPUs such as M1 and M2)

  • (Optional) Vulkan (targeting NVIDIA, AMD, Intel and mobile GPUs)

  • (Optional) OpenCL (targeting NVIDIA, AMD, Intel and mobile GPUs)

Note

  • To target NVIDIA GPUs, either CUDA or Vulkan is required (CUDA is recommended);

  • For AMD and Intel GPUs, Vulkan is necessary;

  • When targeting Apple (macOS, iOS, iPadOS), Metal is a mandatory dependency;

  • Some Android devices only support OpenCL, but most of them support Vulkan.

To easiest way to manage dependency is via conda, which maintains a set of toolchains including LLVM across platforms. To create the environment of those build dependencies, one may simply use:

Set up build dependencies in conda
# make sure to start with a fresh environment
conda env remove -n tvm-build-venv
# create the conda environment with build dependency
conda create -n tvm-build-venv -c conda-forge \
    "llvmdev>=15" \
    "cmake>=3.24" \
    git \
    python=3.11
# enter the build environment
conda activate tvm-build-venv

Step 2. Configure and build. Standard git-based workflow are recommended to download Apache TVM Unity, and then specify build requirements in config.cmake:

Download TVM Unity from GitHub
# clone from GitHub
git clone --recursive git@github.com:mlc-ai/relax.git tvm-unity && cd tvm-unity
# create the build directory
rm -rf build && mkdir build && cd build
# specify build requirements in `config.cmake`
cp ../cmake/config.cmake .

Note

We are temporarily using mlc-ai/relax instead, which comes with several temporary outstanding changes that we will upstream to Apache TVM’s unity branch.

We want to specifically tweak the following flags by appending them to the end of the configuration file:

Configure build in config.cmake
# controls default compilation flags
echo "set(CMAKE_BUILD_TYPE RelWithDebInfo)" >> config.cmake
# LLVM is a must dependency
echo "set(USE_LLVM \"llvm-config --ignore-libllvm --link-static\")" >> config.cmake
echo "set(HIDE_PRIVATE_SYMBOLS ON)" >> config.cmake
# GPU SDKs, turn on if needed
echo "set(USE_CUDA   OFF)" >> config.cmake
echo "set(USE_METAL  OFF)" >> config.cmake
echo "set(USE_VULKAN OFF)" >> config.cmake
echo "set(USE_OPENCL OFF)" >> config.cmake
# FlashInfer related, requires CUDA w/ compute capability 80;86;89;90
echo "set(USE_FLASHINFER OFF)" >> config.cmake
echo "set(FLASHINFER_CUDA_ARCHITECTURES YOUR_CUDA_COMPUTE_CAPABILITY_HERE)" >> config.cmake
echo "set(CMAKE_CUDA_ARCHITECTURES YOUR_CUDA_COMPUTE_CAPABILITY_HERE)" >> config.cmake

Note

HIDE_PRIVATE_SYMBOLS is a configuration option that enables the -fvisibility=hidden flag. This flag helps prevent potential symbol conflicts between TVM and PyTorch. These conflicts arise due to the frameworks shipping LLVMs of different versions.

CMAKE_BUILD_TYPE controls default compilation flag:

  • Debug sets -O0 -g

  • RelWithDebInfo sets -O2 -g -DNDEBUG (recommended)

  • Release sets -O3 -DNDEBUG

Note

If you are using CUDA and your compute capability is above 80, then it is require to build with set(USE_FLASHINFER ON). Otherwise, you may run into Cannot find PackedFunc issue during runtime.

To check your CUDA compute capability, you can use nvidia-smi --query-gpu=compute_cap --format=csv.

Once config.cmake is edited accordingly, kick off build with the commands below:

Build libtvm using cmake and cmake
cmake .. && cmake --build . --parallel $(nproc)

A success build should produce libtvm and libtvm_runtime under /path-tvm-unity/build/ directory.

Leaving the build environment tvm-build-venv, there are two ways to install the successful build into your environment:

export PYTHONPATH=/path-to-tvm-unity/python:$PYTHONPATH

Validate TVM Installation

Using a compiler infrastructure with multiple language bindings could be error-prone. Therefore, it is highly recommended to validate TVM Unity installation before use.

Step 1. Locate TVM Python package. The following command can help confirm that TVM is properly installed as a python package and provide the location of the TVM python package:

>>> python -c "import tvm; print(tvm.__file__)"
/some-path/lib/python3.11/site-packages/tvm/__init__.py

Step 2. Confirm which TVM library is used. When maintaining multiple build or installation of TVM, it becomes important to double check if the python package is using the proper libtvm with the following command:

>>> python -c "import tvm; print(tvm._ffi.base._LIB)"
<CDLL '/some-path/lib/python3.11/site-packages/tvm/libtvm.dylib', handle 95ada510 at 0x1030e4e50>

Step 3. Reflect TVM build option. Sometimes when downstream application fails, it could likely be some mistakes with a wrong TVM commit, or wrong build flags. To find it out, the following commands will be helpful:

>>> python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
... # Omitted less relevant options
GIT_COMMIT_HASH: 4f6289590252a1cf45a4dc37bce55a25043b8338
HIDE_PRIVATE_SYMBOLS: ON
USE_LLVM: llvm-config --link-static
LLVM_VERSION: 15.0.7
USE_VULKAN: OFF
USE_CUDA: OFF
CUDA_VERSION: NOT-FOUND
USE_OPENCL: OFF
USE_METAL: ON
USE_ROCM: OFF

Note

GIT_COMMIT_HASH indicates the exact commit of the TVM build, and it can be found on GitHub via https://github.com/mlc-ai/relax/commit/$GIT_COMMIT_HASH.

Step 4. Check device detection. Sometimes it could be helpful to understand if TVM could detect your device at all with the following commands:

>>> python -c "import tvm; print(tvm.metal().exist)"
True # or False
>>> python -c "import tvm; print(tvm.cuda().exist)"
False # or True
>>> python -c "import tvm; print(tvm.vulkan().exist)"
False # or True

Please note that the commands above verify the presence of an actual device on the local machine for the TVM runtime (not the compiler) to execute properly. However, TVM compiler can perform compilation tasks without requiring a physical device. As long as the necessary toolchain, such as NVCC, is available, TVM supports cross-compilation even in the absence of an actual device.