Skip to content

akashsonowal/ml-systems-cookbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ASPLOS 2024 Workshop

https://github.com/pytorch/workshops/tree/master/ASPLOS_2024

The frontend of regular compilers (lexing/parsing) is very different from the frontend of ML compilers (graph capture, whether through tracing, bytecode analysis, etc.)

Similarly, the backend ends up looking fairly different as well. For one, ML compilers typically start up with much more "semantic" information than traditional compilers. For example, they might do optimizations like "merge two matmuls into a single matmul". Another difference is that the general structure also ends up much "simpler". They usually support very limited forms of control flow, and so much of work involved in traditional compiler passes for handling CFGs also don't matter much.

Finally, most traditional compilers are focused on optimizations for CPUs, but almost all ML is done on GPUs or other accelerators.

Computation


ML Compilers to support Frameworks

MLIR dialects TVM XLA PyTorch Glow cuDNN

Frameworks

PyTorch Tensorflow

Quantization

Framework -> IR -> Machine Code

  • IRs are generated by compilers. IRs are computation graphs.
  • To generate machine code from IR, compiler use codegen. Example of codegen is LLVM. This process in called lowering.
  • Tensorflow XLA, NVCC, TVM all use LLVM

Different types of compilers

-domain specific compiler: NVCC, XLA, PyTorch uses XLA for TPU and Glow for other hardwares. -3rd party compiler: TVM for custom compiler

MLIR helps build own compiler.

WASM (Web Assembly):

Instead of compile to run on a specific hardware. compile it to run on browser (WASM format that can be run with Javascript) WASM compiler: Emscripten (which also uses LLVM codegen), but it only compiles from C and C++ into WASM. scailable is supposed to convert from scikit-learn models into WASM. TVM also compiles to WASM.

Extra

GCC compiles C/C++ code to machine code LLVM is good for CPU and GPU but MLIR is generate framwework for any hardware. LLVM is a subset of MLIR. MLIR (A meta compiler that is used to build other compilers),

Communication


Distributed

  • MultiGPU
  • Multinode

Tech Stack

  • Triton + PyTorch + MLIR
  • Pallas + JAX + XLA

Resources

IR to Machine code: x86 for CPU ptx for GPU

Polyhedral model: The polyhedral model in compilers is a mathematical approach used for optimizing loop nests in high-level programming. In this model, loops are represented as geometric shapes (polyhedra) in a high-dimensional space, where each point in the shape corresponds to an individual iteration of the loop. The edges and faces of the polyhedron represent the relationships and dependencies between different iterations. This representation allows the compiler to perform sophisticated transformations on the loops, such as tiling, fusion, or parallelization, by manipulating the shapes in this abstract space.

These transformations can significantly improve the performance of the program, particularly for applications with complex loop structures and large amounts of data processing (like deep learning!). The polyhedral model excels at capturing and optimizing the parallelism and locality in loop nests, making it a powerful tool for optimizing the core operations found in a neural network, such as matrix multiplication.

CuBLAS for linear algebra CuDNN for DL in GPU Eigen for DL In CPU

Roadmap

  1. Computational Performance: https://d2l.ai/chapter_computational-performance/index.html
  2. MLC Course (TVM): https://mlc.ai/summer22/schedule
  3. Triton Internals (MLIR): https://www.kapilsharma.dev/posts/deep-dive-into-triton-internals/
  4. XLA Deep Dive: https://medium.com/@muhammedashraf2661/demystifying-xla-unlocking-the-power-of-accelerated-linear-algebra-9b62f8180dbd
  5. Torch Dynamo Deep Dive: https://pytorch.org/docs/main/torch.compiler_dynamo_deepdive.html
  6. Perfomance tuning by Paul: https://paulbridger.com/
  7. Ultra Scale book implementation from scratch of all distributed algos.
  8. Model Optimization (Distillation, Quantization, Pruning) - TBD Source
  9. 100 days of cuda

Project

Courses

  1. ML Compilation: https://mlc.ai/summer22/schedule
  2. DL Systems: https://dlsyscourse.org/lectures/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published