You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-1Lines changed: 2 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -178,7 +178,7 @@ We're also fortunate to be integrated into some of the leading open-source libra
178
178
3. Mobius HQQ backend leveraged our int4 kernels to get [195 tok/s on a 4090](https://github.com/mobiusml/hqq#faster-inference)
179
179
4.[TorchTune](https://github.com/pytorch/torchtune) for our QLoRA and QAT recipes
180
180
5.[torchchat](https://github.com/pytorch/torchchat) for post training quantization
181
-
6.[SGLang](https://github.com/sgl-project/sglang/pull/1341) for LLM inference quantization
181
+
6. SGLang for LLM serving: [usage](https://github.com/sgl-project/sglang/blob/4f2ee48ed1c66ee0e189daa4120581de324ee814/docs/backend/backend.md?plain=1#L83) and the major [PR](https://github.com/sgl-project/sglang/pull/1341).
182
182
183
183
## Videos
184
184
*[Keynote talk at GPU MODE IRL](https://youtu.be/FH5wiwOyPX4?si=VZK22hHz25GRzBG1&t=1009)
@@ -205,4 +205,5 @@ If you find the torchao library useful, please cite it in your work as below.
0 commit comments