# Scheduling tensor programs

** Published:**

Tensor programs are ubiquitous, ranging from general matrix muliplication to deep neural networks. However, due to extensive float number computation, execution them can incur high **latency**. Indeed, it is well-known that the rejuvenation of neural networks is very much due to the use of *GPUs* for these computation — thus accelerating tensor programs is of great importance.

The remaining question is *how should we achieve that*? In below I capsulize the methods into two categories:

- Optimize the tensor program with either
**reformulation**or**approximation**- Reformulation: this optimizes only the algorithm from which the tensor program is created. For example, Strassen’s algorithm accelerates matrix multiplication since it uses less multiplications.
- Approximation: this kind of opitmization generates a new tensor program that is similar but is “cheaper”. For example, by quantization, one can obtain a significantly smaller network whose peformance is usually similar to the original net.

- Optimize the execution with hardware advancement
- This keeps tensor program untouched, but instead optimzing how they could be executed on a given hardware. For example, the development of cuDNN is one such endeavour.

We are primarily interested in category 2), but with yet one more aspiration: **automate the process.** The approach to this end requires two ingredients:

- A language — this refers to a domain specific language through which tensor program can be specified.
- A compiler — this refers to a tool that can translate the high-level source code into
**efficient**low-level implementation.

In below are two instances of the above idea.

# Halide

Halide is a `C++`

embedded language, proposed Jonathan Ragan-Kelley and Andrew Adams in 2012.

## Auto-schedulers

OpenTuner: An extensible framework for program autotuning

By Jason Ansel et al., PACI 2014

Automatically Scheduling Halide Image Pipelines

By Mullapudi et al. SIGGRAPH 2016

Differentiable Programming for Image Processing and Deep Learning in Halide

By Tzu-Mao Li eta al., SIGGRAPH 2018

Learning to Optimize Halide with Tree Search and Random Programs

By Andrew Adams et al., SIGGRAPH 2019

Efficient automatic scheduling of imaging and vision pipelines for the GPU

By Lunke Andreson et al., Proceedings of the ACM on Programming Languages 2021

# TVM

TVM is a `Python`

embedded language; its grammar is very much similar to `tensorflow`

, `pytorch`

.

## Auto-schedulers

AutoTVM:

Learning to Optimize Tensor Programs

By Chen et al., Neurips 2018

Ansor:

Ansor : Generating High-Performance Tensor Programs for Deep Learning

By Zhang et al., USENIX 2020

# Difference between Halide and TVM

## Language expressiveness difference

Halide supports `DAG`

like computational logics. TVM is fully compatible with `Tensorflow`

and `PyTorch`

, so it also supports recurrent network architectures.

## Auto-scheduler algorithms difference

- The autoscheduler in Halide schedules a program
`stage by stage`

, where a`stage`

is a functional object. At each stage, it makes two type of decisions- Cross-stage granularity:
- Intra-stage order:

- The autoscheduler in TVM divides the computation graph into independent
`operators`

(an operator is similar to a stage in Halide), then optimize each operator separately.