-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
Description
When you call model.save_pretrained_gguf(...)
on a directory whose name contains spaces (for example, "Unsloth LLM"
), the operation fails. This is because the code builds shell commands as raw strings, and when the shell parses them, it splits on spaces—treating each segment as a separate argument instead of recognizing the directory name as a single path.
==((====))== Unsloth: Conversion from QLoRA to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF 16bits might take 3 minutes.
\ / [2] Converting GGUF 16bits to ['q8_0'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: [1] Converting model at model into q8_0 GGUF format.
The output location will be e:\DM\ML\Unsloth LLM\model\unsloth.Q8_0.gguf
This might take 3 minutes...
usage: convert_hf_to_gguf.py [-h] [--vocab-only] [--outfile OUTFILE]
[--outtype {f32,f16,bf16,q8_0,tq1_0,tq2_0,auto}]
[--bigendian] [--use-temp-file] [--no-lazy]
[--model-name MODEL_NAME] [--verbose]
[--split-max-tensors SPLIT_MAX_TENSORS]
[--split-max-size SPLIT_MAX_SIZE] [--dry-run]
[--no-tensor-first-split] [--metadata METADATA]
[--print-supported-models] [--remote] [--mmproj]
[model]
convert_hf_to_gguf.py: error: unrecognized arguments: LLM\model\unsloth.Q8_0.gguf
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[28], line 2
1 # Save to 8bit Q8_0
----> 2 if True: model.save_pretrained_gguf("model", tokenizer,)
3 # Remember to go to https://huggingface.co/settings/tokens for a token!
4 # And change hf to your username!
5 if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")
File e:\DM\ML\Unsloth LLM\.venv\Lib\site-packages\unsloth\save.py:1861, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
1858 is_sentencepiece_model = check_if_sentencepiece_model(self)
1860 # Save to GGUF
-> 1861 all_file_locations, want_full_precision = save_to_gguf(
1862 model_type, model_dtype, is_sentencepiece_model,
1863 new_save_directory, quantization_method, first_conversion, makefile,
1864 )
1866 # Save Ollama modelfile
1867 modelfile = create_ollama_modelfile(tokenizer, all_file_locations[0])
File e:\DM\ML\Unsloth LLM\.venv\Lib\site-packages\unsloth\save.py:1216, in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)
1205 raise RuntimeError(
1206 f"Unsloth: Quantization failed for {final_location}\n"\
1207 "You are in a Kaggle environment, which might be the reason this is failing.\n"\
(...) 1213 "I suggest you to save the 16bit model first, then use manual llama.cpp conversion."
1214 )
1215 else:
-> 1216 raise RuntimeError(
1217 f"Unsloth: Quantization failed for {final_location}\n"\
1218 "You might have to compile llama.cpp yourself, then run this again.\n"\
1219 "You do not need to close this Python program. Run the following commands in a new terminal:\n"\
1220 "You must run this in the same folder as you're saving your model.\n"\
1221 "git clone --recursive https://github.com/ggerganov/llama.cpp\n"\
1222 "cd llama.cpp && make clean && make all -j\n"\
1223 "Once that's done, redo the quantization."
1224 )
1225 pass
1226 pass
RuntimeError: Unsloth: Quantization failed for e:\DM\ML\Unsloth LLM\model\unsloth.Q8_0.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.
Steps to Reproduce
-
Place your notebook or script in a directory whose path contains a space, for example:
C:\Users\You\My Project\
-
In your Python code, use the following (or set the condition to 'True' if you are using the given template:
model.save_pretrained_gguf("model", tokenizer)
-
Execute the cells.
-
Observe the error:
convert_hf_to_gguf.py: error: unrecognized arguments: LLM\model\unsloth.Q8_0.gguf RuntimeError: Unsloth: Quantization failed for e:\DM\ML\Unsloth LLM\model\unsloth.Q8_0.gguf
Environment:
- OS:
Windows locally
- GPU: 1
- Python:
3.12.10
- PyTorch:
2.4.0+cu118
- Unsloth:
2025.4.8
- Transformers:
4.51.3
- TRL:
0.15.2
- Trainer:
SFTTrainer
- Template: https://colab.research.google.com/drive/1T5-zKWM_5OD21QHwXHiV9ixTRR7k3iB9
Please let me know if anything else is needed. Thank you!