6. GPU and Hardware Acceleration
Quick search
code
Show Source
Course GitHub 中文版
Machine Learing Compiler
Table Of Contents
  • 1. Introduction
  • 2. Tensor Program Abstraction
    • 2.1. Primitive Tensor Function
    • 2.2. Tensor Program Abstraction
    • 2.3. Summary
    • 2.4. TensorIR: Tensor Program Abstraction Case Study
    • 2.5. Exercises for TensorIR
  • 3. End to End Model Execution
  • 4. Automatic Program Optimization
  • 5. Integration with Machine Learning Frameworks
  • 6. GPU and Hardware Acceleration
    • 6.1. Part 1
    • 6.2. Part 2
  • 7. Computational Graph Optimization
Machine Learing Compiler
Table Of Contents
  • 1. Introduction
  • 2. Tensor Program Abstraction
    • 2.1. Primitive Tensor Function
    • 2.2. Tensor Program Abstraction
    • 2.3. Summary
    • 2.4. TensorIR: Tensor Program Abstraction Case Study
    • 2.5. Exercises for TensorIR
  • 3. End to End Model Execution
  • 4. Automatic Program Optimization
  • 5. Integration with Machine Learning Frameworks
  • 6. GPU and Hardware Acceleration
    • 6.1. Part 1
    • 6.2. Part 2
  • 7. Computational Graph Optimization

6. GPU and Hardware Acceleration¶

  • 6.1. Part 1
    • 6.1.1. Install packages
    • 6.1.2. Preparations
    • 6.1.3. GPU Architecture
    • 6.1.4. Window Sum Example
    • 6.1.5. Matrix Multiplication
    • 6.1.6. Shared Memory Blocking
    • 6.1.7. Leveraging Automatic Program Optimization
    • 6.1.8. Summary
  • 6.2. Part 2
    • 6.2.1. Preparations
    • 6.2.2. Hardware Specialization Trend
    • 6.2.3. Tensorization
    • 6.2.4. Discussions
    • 6.2.5. Summary
Previous
5. Integration with Machine Learning Frameworks
Next
6.1. Part 1