Home

Tutorial: TinyQ

TinyQ is a PyTorch library designed for Post-Training Quantization of neural network models, primarily focusing on nn.Linear layers. The central Quantizer object takes a standard model and, using specific Quantization Methods like W8A32 or W8A16, replaces standard layers with Custom Quantized Layers. These custom layers store weights in a compressed format using Weight Quantization Math and perform calculations efficiently via Quantized Forward Pass Functions, making the model smaller and faster for inference.

Visual Overview

flowchart TD
    A0["Quantizer Class"]
    A1["Quantization Methods (W8A32, W8A16)"]
    A2["Custom Quantized Layers"]
    A3["Weight Quantization Math"]
    A4["Quantized Forward Pass Functions"]
    A5["Model Structure Replacement"]
    A6["Model Handling & Utilities"]

    A0 -- "Orchestrates" --> A5
    A0 -- "Applies" --> A1
    A5 -- "Installs" --> A2
    A2 -- "Uses for weights" --> A3
    A2 -- "Implement with" --> A4
    A1 -- "Defines logic for" --> A4
    A6 -- "Supports workflow of" --> A0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Tutorial: TinyQ

Visual Overview

Chapters

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally