tvm.runtime.disco

TVM distributed runtime API.

class tvm.runtime.disco.DModule(dref: DRef, session: Session)

A Module in a Disco session.

class tvm.runtime.disco.DPackedFunc(dref: DRef, session: Session)

A PackedFunc in a Disco session.

class tvm.runtime.disco.DRef

An object that exists on all workers. The controller process assigns a unique “register id” to each object, and the worker process uses this id to refer to the object residing on itself.

debug_copy_from(worker_id: int, value: ndarray | NDArray) None

Copy an NDArray value to remote for debugging purposes.

Parameters:
  • worker_id (int) – The id of the worker to be copied to.

  • value (Union[numpy.ndarray, NDArray]) – The value to be copied.

debug_get_from_remote(worker_id: int) Any

Get the value of a DRef from a remote worker. It is only used for debugging purposes.

Parameters:

worker_id (int) – The id of the worker to be fetched from.

Returns:

value – The value of the register.

Return type:

object

class tvm.runtime.disco.ProcessSession(num_workers: int, entrypoint: str)

A Disco session backed by pipe-based multi-processing.

class tvm.runtime.disco.Session

A Disco interactive session. It allows users to interact with the Disco command queue with various PackedFunc calling convention.

allgather(src: DRef, dst: DRef) DRef

Perform an allgather operation on an array.

Parameters:
  • src (DRef) – The array to be gathered from.

  • dst (DRef) – The array to be gathered to.

allreduce(src: DRef, dst: DRef, op: str = 'sum') DRef

Perform an allreduce operation on an array.

Parameters:
  • array (DRef) – The array to be reduced.

  • op (str = "sum") – The reduce operation to be performed. Available options are: - “sum” - “prod” - “min” - “max” - “avg”

broadcast_from_worker0(src: DRef, dst: DRef) DRef

Broadcast an array from worker-0 to all other workers.

Parameters:

array (DRef) – The array to be broadcasted in-place

call_packed(func: DRef, *args) DRef

Call a PackedFunc on workers providing variadic arguments.

Parameters:
  • func (PackedFunc) – The function to be called.

  • *args (various types) – In the variadic arguments, the supported types include: - integers and floating point numbers; - DLDataType; - DLDevice; - str (std::string in C++); - DRef.

Returns:

return_value – The return value of the function call.

Return type:

various types

Notes

Examples of unsupported types: - NDArray, DLTensor,; - TVM Objects, including PackedFunc, Module and String.

copy_from_worker_0(host_array: NDArray, remote_array: DRef) None

Copy an NDArray from worker-0 to the controller-side NDArray.

Parameters:
  • host_array (numpy.ndarray) – The array to be copied to worker-0.

  • remote_array (NDArray) – The NDArray on worker-0.

copy_to_worker_0(host_array: NDArray, remote_array: DRef) None

Copy the controller-side NDArray to worker-0.

Parameters:
  • host_array (numpy.ndarray) – The array to be copied from worker-0.

  • remote_array (NDArray) – The NDArray on worker-0.

empty(shape: Sequence[int], dtype: str, device: Device | None = None) DRef

Create an empty NDArray on all workers and attach them to a DRef.

Parameters:
  • shape (tuple of int) – The shape of the NDArray.

  • dtype (str) – The data type of the NDArray.

  • device (Optional[Device] = None) – The device of the NDArray.

Returns:

array – The created NDArray.

Return type:

DRef

gather_to_worker0(from_array: DRef, to_array: DRef) None

Gather an array from all other workers to worker-0.

Parameters:
  • from_array (DRef) – The array to be gathered from.

  • to_array (DRef) – The array to be gathered to.

get_global_func(name: str) DRef

Get a global function on workers.

Parameters:

name (str) – The name of the global function.

Returns:

func – The global packed function

Return type:

DRef

init_ccl(ccl: str, *device_ids)

Initialize the underlying communication collective library.

Parameters:
  • ccl (str) – The name of the communication collective library. Currently supported libraries are: - nccl - rccl - mpi

  • *device_ids (int) – The device IDs to be used by the underlying communication library.

load_vm_module(path: str, device: Device | None = None) DModule

Load a VM module from a file.

Parameters:
  • path (str) – The path to the VM module file.

  • device (Optional[Device] = None) – The device to load the VM module to. Default to the default device of each worker.

Returns:

module – The loaded VM module.

Return type:

DModule

scatter_from_worker0(from_array: DRef, to_array: DRef) None

Scatter an array from worker-0 to all other workers.

Parameters:
  • from_array (DRef) – The array to be scattered from.

  • to_array (DRef) – The array to be scattered to.

sync_worker_0() None

Synchronize the controller with worker-0, and it will wait until the worker-0 finishes executing all the existing instructions.

class tvm.runtime.disco.ThreadedSession(num_workers: int)

A Disco session backed by multi-threading.