|
| 1 | +# Observers Overview |
| 2 | + |
| 3 | +An `Observer` in `llm-compressor` is a utility class responsible for analyzing tensors (e.g., weights, activations) and producing quantization parameters such as `scale` and `zero_point`. These observers are used by quantization modifiers to compute the statistics necessary for transforming tensors into lower precision formats. |
| 4 | + |
| 5 | +Observers are designed to be flexible and support a variety of quantization strategies, including per-tensor, per-group, per-channel, and per-token quantization. |
| 6 | + |
| 7 | +## Base Class |
| 8 | + |
| 9 | +### [Observer](../src/llmcompressor/observers/base.py) |
| 10 | +Base class for all observers. Subclasses must implement the `calculate_qparams` method to define how quantization parameters are computed. |
| 11 | + |
| 12 | +The base class handles: |
| 13 | +- Group-wise scale/zero_point computation |
| 14 | +- Token-wise and channel-wise quantization logic |
| 15 | +- Optional support for `g_idx` (group index mappings) |
| 16 | +- Recording observed tokens for logging and analysis |
| 17 | +- Resetting internal state during lifecycle transitions |
| 18 | + |
| 19 | +This class is not used directly but provides the scaffolding for all custom observers. |
| 20 | + |
| 21 | +## Implemented Observers |
| 22 | + |
| 23 | +### [MinMax](../src/llmcompressor/observers/min_max.py) |
| 24 | +Computes `scale` and `zero_point` by tracking the minimum and maximum of the observed tensor. This is the simplest and most common observer. Works well for symmetric and asymmetric quantization. |
| 25 | + |
| 26 | +Best used for: |
| 27 | +- Int8 or Int4 symmetric quantization |
| 28 | +- Channel-wise or group-wise strategies |
| 29 | + |
| 30 | +### [MSE](../src/llmcompressor/observers/mse.py) |
| 31 | +Computes quantization parameters by minimizing the Mean Squared Error (MSE) between the original and quantized tensor. Optionally maintains a moving average of min/max values for smoother convergence. |
| 32 | + |
| 33 | +Best used when: |
| 34 | +- Calibration accuracy is critical |
| 35 | +- Quantization error needs to be tightly controlled |
| 36 | + |
| 37 | +## Quantization Strategies |
| 38 | + |
| 39 | +Observers support multiple quantization strategies via the `QuantizationArgs.strategy` field: |
| 40 | + |
| 41 | +- `TENSOR`: Global scale and zero_point across entire tensor. |
| 42 | +- `GROUP`, `TENSOR_GROUP`: Slice tensor into equal-sized groups along columns. |
| 43 | +- `CHANNEL`: Per-channel quantization (e.g., across output dimensions). |
| 44 | +- `TOKEN`: Quantize activations along token or sequence dimensions. |
| 45 | +- `BLOCK`: *(Not yet implemented)* Placeholder for block-wise quantization. |
| 46 | + |
| 47 | +## Observer Configuration Parameters |
| 48 | + |
| 49 | +Observers can be configured with optional keyword arguments that control their behavior. These are passed through the `QuantizationArgs.observer_kwargs` dictionary and parsed internally when the observer is initialized. |
| 50 | + |
| 51 | +Below are the supported configuration parameters and their meanings: |
| 52 | + |
| 53 | +| Argument | Default Value | |
| 54 | +|---------------------|---------------| |
| 55 | +| `maxshrink` | `0.20` | |
| 56 | +| `patience` | `5` | |
| 57 | +| `averaging_constant`| `0.01` | |
| 58 | +| `grid` | `100.0` | |
| 59 | +| `norm` | `2.0` | |
| 60 | + |
| 61 | +## Example Usage |
| 62 | + |
| 63 | +```python |
| 64 | +from llmcompressor.observers import Observer |
| 65 | +from compressed_tensors.quantization.quant_args import QuantizationArgs |
| 66 | + |
| 67 | +args = QuantizationArgs(num_bits=4, strategy="group", group_size=128) |
| 68 | +observer = Observer.load_from_registry("minmax", quantization_args=args) |
| 69 | + |
| 70 | +x = torch.randn(64, 512) |
| 71 | +scale, zero_point = observer(x) |
| 72 | +``` |
| 73 | + |
| 74 | +## Example yaml Usage |
| 75 | +``` yaml |
| 76 | +quantization_stage: |
| 77 | + quantization_modifiers: |
| 78 | + GPTQModifier: |
| 79 | + weights: |
| 80 | + observer: mse |
| 81 | + observer_kwargs: |
| 82 | + maxshrink: 0.1 |
| 83 | + patience: 10 |
| 84 | + averaging_constant: 0.05 |
| 85 | + grid: 128.0 |
| 86 | + norm: 2.0 |
| 87 | + num_bits: 4 |
| 88 | + type: int |
| 89 | + symmetric: true |
| 90 | + strategy: channel |
| 91 | + targets: |
| 92 | + - Linear |
| 93 | +``` |
0 commit comments