Skip to content

Commit 7c186b8

Browse files
authored
Update README.md
1 parent f262ec5 commit 7c186b8

File tree

1 file changed

+23
-1
lines changed

1 file changed

+23
-1
lines changed

README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,26 @@
1-
# This is a fork of qwopqwop200's repository meant for stable usage in https://github.com/oobabooga/text-generation-webui.
1+
## This is a fork of qwopqwop200's repository meant for stable usage in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
2+
This package uses [import redirection](https://github.com/jllllll/GPTQ-for-LLaMa-CUDA/blob/main/gptq_for_llama/__init__.py) to allow for easier integration with existing projects.
3+
4+
[Oobabooga's fork](https://github.com/oobabooga/GPTQ-for-LLaMa) is used by default when a compatible GPU is detected.
5+
[qwopqwop200's 'cuda' branch](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda) is used for GPUs older than Pascal.
6+
AMD-compatible conversions of both are available courtesy of [WapaMario63's](https://github.com/WapaMario63) work: [GPTQ-for-LLaMa-ROCm](https://github.com/WapaMario63/GPTQ-for-LLaMa-ROCm)
7+
8+
Python modules can be imported as if they are in the main package and the appropriate versions will be selected:
9+
```python
10+
import gptq_for_llama.llama_inference_offload
11+
from gptq_for_llama.modelutils import find_layers
12+
from gptq_for_llama.quant import make_quant
13+
```
14+
15+
This can be overriden by setting the `QUANT_CUDA_OVERRIDE` environment variable to either `old` or `new` before importing.
16+
There is also an experimental function for switching versions on the fly:
17+
```python
18+
from gptq_for_llama import switch_gptq
19+
20+
switch_gptq('new')
21+
import gptq_for_llama.llama_inference_offload
22+
```
23+
Limited testing showed reliable swapping of versions. However, this may not work when swapping models repeatedly.
224

325
# GPTQ-for-LLaMA
426
4 bits quantization of [LLaMA](https://arxiv.org/abs/2302.13971) using [GPTQ](https://arxiv.org/abs/2210.17323)

0 commit comments

Comments
 (0)