Skip to content

Commit bac039f

Browse files
authored
Update README.md (#1758)
1 parent 8d38814 commit bac039f

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

torchao/quantization/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,8 @@ Marlin QQQ is an optimized GPU kernel that supports W4A8 mixed precision GEMM. F
348348
### Gemlite Triton
349349
Int4 and Int8 quantization using the [Gemlite Triton](https://github.com/mobiusml/gemlite) kernels. You can try it out with the `quantize_` api as above alongside the constructor `gemlite_uintx_weight_only`. An example can be found in `torchao/_models/llama/generate.py`.
350350

351+
Note: we test on gemlite 0.4.1, but should be able to use any version after that, we'd recommend to use the latest release to get the most recent performance improvements.
352+
351353
### UINTx Quantization
352354
We're trying to develop kernels for low bit quantization for intx quantization formats. While the current performance is not ideal, we're hoping to continue to iterate on these kernels to improve their performance.
353355

0 commit comments

Comments
 (0)