Computation

ASPLOS 2024 Workshop

https://github.com/pytorch/workshops/tree/master/ASPLOS_2024

The frontend of regular compilers (lexing/parsing) is very different from the frontend of ML compilers (graph capture, whether through tracing, bytecode analysis, etc.)

Similarly, the backend ends up looking fairly different as well. For one, ML compilers typically start up with much more "semantic" information than traditional compilers. For example, they might do optimizations like "merge two matmuls into a single matmul". Another difference is that the general structure also ends up much "simpler". They usually support very limited forms of control flow, and so much of work involved in traditional compiler passes for handling CFGs also don't matter much.

Finally, most traditional compilers are focused on optimizations for CPUs, but almost all ML is done on GPUs or other accelerators.

Computation

ML Compilers to support Frameworks

MLIR dialects TVM XLA PyTorch Glow cuDNN

Frameworks

PyTorch Tensorflow

Quantization

Framework -> IR -> Machine Code

IRs are generated by compilers. IRs are computation graphs.
To generate machine code from IR, compiler use codegen. Example of codegen is LLVM. This process in called lowering.
Tensorflow XLA, NVCC, TVM all use LLVM

Different types of compilers

-domain specific compiler: NVCC, XLA, PyTorch uses XLA for TPU and Glow for other hardwares. -3rd party compiler: TVM for custom compiler

MLIR helps build own compiler.

WASM (Web Assembly):

Instead of compile to run on a specific hardware. compile it to run on browser (WASM format that can be run with Javascript) WASM compiler: Emscripten (which also uses LLVM codegen), but it only compiles from C and C++ into WASM. scailable is supposed to convert from scikit-learn models into WASM. TVM also compiles to WASM.

Extra

GCC compiles C/C++ code to machine code LLVM is good for CPU and GPU but MLIR is generate framwework for any hardware. LLVM is a subset of MLIR. MLIR (A meta compiler that is used to build other compilers),

Communication

Distributed

MultiGPU
Multinode

Tech Stack

Triton + PyTorch + MLIR
Pallas + JAX + XLA

Resources

IR to Machine code: x86 for CPU ptx for GPU

Polyhedral model: The polyhedral model in compilers is a mathematical approach used for optimizing loop nests in high-level programming. In this model, loops are represented as geometric shapes (polyhedra) in a high-dimensional space, where each point in the shape corresponds to an individual iteration of the loop. The edges and faces of the polyhedron represent the relationships and dependencies between different iterations. This representation allows the compiler to perform sophisticated transformations on the loops, such as tiling, fusion, or parallelization, by manipulating the shapes in this abstract space.

These transformations can significantly improve the performance of the program, particularly for applications with complex loop structures and large amounts of data processing (like deep learning!). The polyhedral model excels at capturing and optimizing the parallelism and locality in loop nests, making it a powerful tool for optimizing the core operations found in a neural network, such as matrix multiplication.

CuBLAS for linear algebra CuDNN for DL in GPU Eigen for DL In CPU

Roadmap

Computational Performance: https://d2l.ai/chapter_computational-performance/index.html
MLC Course (TVM): https://mlc.ai/summer22/schedule
Triton Internals (MLIR): https://www.kapilsharma.dev/posts/deep-dive-into-triton-internals/
XLA Deep Dive: https://medium.com/@muhammedashraf2661/demystifying-xla-unlocking-the-power-of-accelerated-linear-algebra-9b62f8180dbd
Torch Dynamo Deep Dive: https://pytorch.org/docs/main/torch.compiler_dynamo_deepdive.html
Perfomance tuning by Paul: https://paulbridger.com/
Ultra Scale book implementation from scratch of all distributed algos.
Model Optimization (Distillation, Quantization, Pruning) - TBD Source
100 days of cuda

Project

Courses

ML Compilation: https://mlc.ai/summer22/schedule
DL Systems: https://dlsyscourse.org/lectures/

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
nccl_binding.h		nccl_binding.h
nccl_kernel.h		nccl_kernel.h
readme.md		readme.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ASPLOS 2024 Workshop

Computation

ML Compilers to support Frameworks

Frameworks

Quantization

Framework -> IR -> Machine Code

Different types of compilers

WASM (Web Assembly):

Extra

Communication

Distributed

Resources

Roadmap

Project

Courses

About

Uh oh!

Releases

Packages

Uh oh!

Languages

akashsonowal/ml-systems-cookbook

Folders and files

Latest commit

History

Repository files navigation

ASPLOS 2024 Workshop

Computation

ML Compilers to support Frameworks

Frameworks

Quantization

Framework -> IR -> Machine Code

Different types of compilers

WASM (Web Assembly):

Extra

Communication

Distributed

Resources

Roadmap

Project

Courses

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages