tvm.runtime.profiling

Registration of profiling objects in python.

class tvm.runtime.profiling.Count(count: int)

A integer count of something

class tvm.runtime.profiling.DeviceWrapper(dev: Device)

Wraps a tvm.runtime.Device

class tvm.runtime.profiling.Duration(duration: float)

A duration of something

class tvm.runtime.profiling.MetricCollector

Interface for user defined profiling metric collection.

class tvm.runtime.profiling.Percent(percent: float)

A Percent of something

class tvm.runtime.profiling.Ratio(ratio: float)

A Ratio of two things

class tvm.runtime.profiling.Report(calls: Sequence[Dict[str, Object]], device_metrics: Dict[str, Dict[str, Object]], configuration: Dict[str, Object])

A container for information gathered during a profiling run.

calls

Per-call profiling metrics (function name, runtime, device, …).

Type:

Array[Dict[str, Object]]

device_metrics

Per-device metrics collected over the entire run.

Type:

Dict[Device, Dict[str, Object]]

csv()

Convert this profiling report into CSV format.

This only includes calls and not overall metrics.

Returns:

csvcalls in CSV format.

Return type:

str

classmethod from_json(s)

Deserialize a report from JSON.

Parameters:

s (str) – Report serialize via json().

Returns:

report – The deserialized report.

Return type:

Report

json()

Convert this profiling report into JSON format.

Example output:

Returns:

json – Formatted JSON

Return type:

str

table(sort=True, aggregate=True, col_sums=True)

Generate a human-readable table

Parameters:
  • sort (bool) – If aggregate is true, whether to sort call frames by descending duration. If aggregate is False, whether to sort frames by order of appearancei n the program.

  • aggregate (bool) – Whether to join multiple calls to the same op into a single line.

  • col_sums (bool) – Whether to include the sum of each column.

Returns:

table – A human-readable table

Return type:

str

tvm.runtime.profiling.profile_function(mod, dev, collectors, func_name=None, warmup_iters=10)

Collect performance information of a function execution. Usually used with a compiled PrimFunc.

This information can include performance counters like cache hits and FLOPs that are useful in debugging performance issues of individual PrimFuncs. Different metrics can be collected depending on which MetricCollector is used.

Example

Parameters:
  • mod (Module) – Module containing the function to profile.

  • dev (Device) – Device to run the function on.

  • collectors (List[MetricCollector]) – :py:class:`MetricCollector`s which will collect performance information.

  • func_name (Optional[str]) – Name of the function in mod to profile. Defaults to the entry_name of mod.

  • warmup_iters (int) – Number of iterations to run the function before collecting performance information. Recommended to set this larger than 0 for consistent cache effects. Defaults to 10.

Returns:

prof – PackedFunc which takes the same arguments as the mod[func_name] and returns performance metrics as a Dict[str, ObjectRef] where values can be CountNode, DurationNode, PercentNode.

Return type:

PackedFunc[args, Dict[str, ObjectRef]]