From 71479889ecb275886df30c6c570c3bcd6c346cb5 Mon Sep 17 00:00:00 2001 From: R Date: Sat, 9 Nov 2024 01:56:31 +0000 Subject: [PATCH 1/2] Update README.md Add tl2 to the quant-type optional argument in the setup_env.py instructions --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4bbfc6aa..43043af3 100644 --- a/README.md +++ b/README.md @@ -150,7 +150,7 @@ optional arguments: Directory to save/load the model --log-dir LOG_DIR, -ld LOG_DIR Directory to save the logging info - --quant-type {i2_s,tl1}, -q {i2_s,tl1} + --quant-type {i2_s,tl1,tl2}, -q {i2_s,tl1,tl2} Quantization type --quant-embd Quantize the embeddings to f16 --use-pretuned, -p Use the pretuned kernel parameters From 29109f35fde45da94b7153141aa954b798b74613 Mon Sep 17 00:00:00 2001 From: R Date: Sat, 9 Nov 2024 18:42:53 +0000 Subject: [PATCH 2/2] Update README.md Add tl2 to readme Instruct to use pretuned kernels by default --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 43043af3..621bc9fd 100644 --- a/README.md +++ b/README.md @@ -130,14 +130,14 @@ pip install -r requirements.txt 3. Build the project ```bash # Download the model from Hugging Face, convert it to quantized gguf format, and build the project -python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s +python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s -p # Or you can manually download the model and run with local path huggingface-cli download HF1BitLLM/Llama3-8B-1.58-100B-tokens --local-dir models/Llama3-8B-1.58-100B-tokens -python setup_env.py -md models/Llama3-8B-1.58-100B-tokens -q i2_s +python setup_env.py -md models/Llama3-8B-1.58-100B-tokens -q i2_s -p ```
-usage: setup_env.py [-h] [--hf-repo {1bitLLM/bitnet_b1_58-large,1bitLLM/bitnet_b1_58-3B,HF1BitLLM/Llama3-8B-1.58-100B-tokens}] [--model-dir MODEL_DIR] [--log-dir LOG_DIR] [--quant-type {i2_s,tl1}] [--quant-embd]
+usage: setup_env.py [-h] [--hf-repo {1bitLLM/bitnet_b1_58-large,1bitLLM/bitnet_b1_58-3B,HF1BitLLM/Llama3-8B-1.58-100B-tokens}] [--model-dir MODEL_DIR] [--log-dir LOG_DIR] [--quant-type {i2_s,tl1,tl2}] [--quant-embd]
                     [--use-pretuned]
 
 Setup the environment for running inference