Skip to content
Afonso Diela edited this page Jun 19, 2025 · 2 revisions

Tutorial: TinyQ

TinyQ is a PyTorch library designed for Post-Training Quantization of neural network models, primarily focusing on nn.Linear layers. The central Quantizer object takes a standard model and, using specific Quantization Methods like W8A32 or W8A16, replaces standard layers with Custom Quantized Layers. These custom layers store weights in a compressed format using Weight Quantization Math and perform calculations efficiently via Quantized Forward Pass Functions, making the model smaller and faster for inference.

Visual Overview

flowchart TD
    A0["Quantizer Class"]
    A1["Quantization Methods (W8A32, W8A16)"]
    A2["Custom Quantized Layers"]
    A3["Weight Quantization Math"]
    A4["Quantized Forward Pass Functions"]
    A5["Model Structure Replacement"]
    A6["Model Handling & Utilities"]

    A0 -- "Orchestrates" --> A5
    A0 -- "Applies" --> A1
    A5 -- "Installs" --> A2
    A2 -- "Uses for weights" --> A3
    A2 -- "Implement with" --> A4
    A1 -- "Defines logic for" --> A4
    A6 -- "Supports workflow of" --> A0
Loading

Chapters

  1. Quantizer Class
  2. Quantization Methods (W8A32, W8A16)
  3. Custom Quantized Layers
  4. Model Structure Replacement
  5. Weight Quantization Math
  6. Quantized Forward Pass Functions
  7. Model Handling & Utilities
Clone this wiki locally