-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Afonso Diela edited this page Jun 19, 2025
·
2 revisions
TinyQ is a PyTorch library designed for Post-Training Quantization of neural network models, primarily focusing on nn.Linear
layers. The central Quantizer object takes a standard model and, using specific Quantization Methods like W8A32 or W8A16, replaces standard layers with Custom Quantized Layers. These custom layers store weights in a compressed format using Weight Quantization Math and perform calculations efficiently via Quantized Forward Pass Functions, making the model smaller and faster for inference.
flowchart TD
A0["Quantizer Class"]
A1["Quantization Methods (W8A32, W8A16)"]
A2["Custom Quantized Layers"]
A3["Weight Quantization Math"]
A4["Quantized Forward Pass Functions"]
A5["Model Structure Replacement"]
A6["Model Handling & Utilities"]
A0 -- "Orchestrates" --> A5
A0 -- "Applies" --> A1
A5 -- "Installs" --> A2
A2 -- "Uses for weights" --> A3
A2 -- "Implement with" --> A4
A1 -- "Defines logic for" --> A4
A6 -- "Supports workflow of" --> A0