Primitive Tensor Function
-------------------------

The introductory overview showed that the MLC process could be viewed as
transformations among tensor functions. A typical model execution
involves several computation steps that transform tensors from input to
the final prediction, and each unit step is called a primitive tensor
function.

.. _fig_primitive_tensor_func:

.. figure:: ../img/primitive_tensor_func.png

   Primitive Tensor Function


In the above figure, the tensor operator linear, add, relu, and softmax
are all primitive tensor functions. Notably, many different abstractions
can represent (and implement) the same primitive tensor function add (as
shown in the figure below). We can choose to call into pre-built
framework libraries(e.g. torch.add or numpy.add), and leverage an
implementation in python. In practice, primitive functions are
implemented in low-level languages such as C/C++ with sometimes a
mixture of assembly code.

.. _fig_tensor_func_abstractions:

.. figure:: ../img/tensor_func_abstractions.png

   Different forms of the same primitive tensor function


Many frameworks offer machine learning compilation procedures to
transform primitive tensor functions into more specialized ones for the
particular workload and deployment environment.

.. _fig_tensor_func_transformation:

.. figure:: ../img/tensor_func_transformation.png

   Transformations between primitive tensor functions


The above figure shows an example where the implementation of the
primitive tensor function add gets transformed into a different
implementation. The particular code on the right is a pseudo-code
representing possible set optimizations: the loop gets split into units
of length ``4`` where ``f32x4`` add corresponds to a special vector add
function that carries out the computation.

Tensor Program Abstraction
--------------------------

The last section talks about the need to transform primitive tensor
functions. In order for us to effectively do so, we need an effective
abstraction to represent the programs.

Usually, a typical abstraction for primitive tensor function
implementation contains the following elements: multi-dimensional
buffers, loop nests that drive the tensor computations, and finally, the
compute statements themselves.

.. _fig_tensor_func_elements:

.. figure:: ../img/tensor_func_elements.png

   The typical elements in a primitive tensor function


We call this type of abstraction tensor program abstraction. One
important property of tensor program abstraction is the ability to
change the program through a sequence of transformations pragmatically.

.. _fig_tensor_func_seq_transform:

.. figure:: ../img/tensor_func_seq_transform.png

   Sequential transformations on a primitive tensor function


For example, we should be able to use a set of transformation
primitives(split, parallelize, vectorize) to take the initial loop
program and transform it into the program on the right-hand side.

Extra Structure in Tensor Program Abstraction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Importantly, we cannot perform arbitrary transformations on the program
as some computations depend on the order of the loop. Luckily, most
primitive tensor functions we are interested in have good properties
(such as independence among loop iterations).

Tensor programs can incorporate this extra information as part of the
program to facilitate program transformations.

.. _fig_tensor_func_iteration:

.. figure:: ../img/tensor_func_iteration.png

   Iteration is the extra information for tensor programs


For example, the above program contains the additional
``T.axis.spatial`` annotation, which shows that the particular variable
``vi`` is mapped to ``i``, and all the iterations are independent. This
information is not necessary to execute the particular program but comes
in handy when we transform the program. In this case, we will know that
we can safely parallelize or reorder loops related to ``vi`` as long as
we visit all the index elements from ``0`` to ``128``.

Summary
-------

-  Primitive tensor function refers to the single unit of computation in
   model execution.

   -  A MLC process can choose to transform implementation of primitive
      tensor functions.

-  Tensor program is an effective abstraction to represent primitive
   tensor functions.

   -  Key elements include: multi-dimensional buffer, loop nests,
      computation statement.
   -  Program-based transformations can be used to optimize tensor
      programs.
   -  Extra structure can help to provide more information to the
      transformations.