airoboros-33b-gpt4-2.0.ggmlv3.q5_K_M converted to GGUF "loses" some words #2822

JohanAR · 2023-08-26T21:25:35Z

JohanAR
Aug 26, 2023

I'm using a WIP branch of oobabooga/text-generation-webui, so there could of course be something more that needs to be updated.

Converted airoboros-33b-gpt4-2.0.ggmlv3.q5_K_M.bin from TheBloke to GGUF using the python script and metadata from original model on huggingface.

Example generated text by GGUF model:

Batman: On the, Joker, I see no comedy in human or misfortune. The world is far from a laughing matter.

Joker: Oh but it is! It's a grand farce played by fools who think they hold the script in their hands when all along we are puppets dancing to someone else's tune.

Batman: Your is as twisted as your mind, Joker. There is no joy in causing pain and.

It kind of looks like it has forgotten the words "contrary" and "suffering", which would fit well where there appears to be missing words in the first and third sentences. It happens very regularly with the converted model, while the original GGML works fine.

Log from converter script:

* Using config: Namespace(input=PosixPath('airoboros-33b-gpt4-2.0.ggmlv3.q5_K_M.bin'), output=PosixPath('airoboros-33b-gpt4-2.0.q5_K_M.gguf'), name=None, desc=None, gqa=1, eps='5.0e-06', context_length=2048, model_metadata_dir=PosixPath('/tmp/tmp.X3SU5hc517'), vocab_dir=None, vocabtype='spm')

=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===

* Scanning GGML input file
* GGML model hyperparameters: <Hyperparameters: n_vocab=32000, n_embd=6656, n_mult=4480, n_head=52, n_layer=60, n_rot=128, n_ff=17920, ftype=17>
Loading vocab file '/tmp/tmp.X3SU5hc517/tokenizer.model', type 'spm'
!! Note: When overriding params the --gqa, --eps and --context-length options are ignored.
* Overriding params: Params(n_vocab=32000, n_embd=6656, n_mult=4480, n_layer=60, n_ctx=2048, n_ff=17920, n_head=52, n_head_kv=52, f_norm_eps=1e-06, f_rope_freq_base=None, f_rope_scale=None, ftype=None, path_model=None)
* Overriding vocab: <SentencePieceVocab with 32000 base tokens and 0 added tokens>
* Preparing to save GGUF file
* Adding model parameters and KV items
* Adding vocab item(s)
* Adding 543 tensor(s)
    gguf: write header
    gguf: write metadata
    gguf: write tensors
* Successful completion. Output saved to: airoboros-33b-gpt4-2.0.q5_K_M.gguf

Anything I'm likely doing wrong during conversion? Only passing the --input --output and --model-metdata-dir parameters. The metadata dir has tokenizer_config.json tokenizer_config.json tokenizer.model config.json from https://huggingface.co/jondurbin/airoboros-33b-gpt4-2.0/tree/main

person4268 · 2023-08-27T04:50:29Z

person4268
Aug 27, 2023

I've also encountered this, with a llama-2-70b-chat q3_k_m ggml, after converting the same way you did. I'll try and use a 70B GGUF from TheBloke (presumably converted directly from the pytorch .bins) and report back if those are fine.

3 replies

person4268 Aug 27, 2023

genz-70b.Q3_K_M.gguf from TheBloke exhibits the same issues in text-gen-ui's gguf2 branch with the current master and main of both llama.cpp (c1ac54b) and llama-cpp-python (9ab49bc). I built llama-cpp-python's libllama with export FORCE_CMAKE=1 HCC_AMDGPU_TARGET=gfx1030 CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_CXX_FLAGS=\"-fPIC\" -DLLAMA_NATIVE=on -DLLAMA_LTO=on" && python setup.py develop && pip install -e .

I thought I would be able to, but I'm not experiencing the issue with ./main -m /mnt3/models/genz-70b.Q3_K_M.gguf -ngl 22 --no-mmap -lv -c 2048 --interactive-first --multiline-input. And I've also figured out that it works with the server implementation, and text-generation-ui's llamacpp_hf wrapper, suggesting that the issue is in text-generation-ui, I think (both seem to use llama-cpp-python's high level wrapper, so it's not that, I think? unless its a sampling issue)

(for the record, my test prompt was asking "Why is the sky blue", to which the model I tried would shorten a phenomenon to aon)

JohanAR Aug 27, 2023
Author

Thanks for testing! It seems like the gguf2 branch was merged while I was asleep so I will run some tests and report an issue with text-generation-webui in case the behaviour is still there.

person4268 Aug 28, 2023

looks like the root cause was fixed by abetlen/llama-cpp-python@4100bde, which the latest commit of text-generation-ui has updated to

KerfuffleV2 · 2023-08-27T05:53:10Z

KerfuffleV2
Aug 27, 2023
Collaborator

Anything I'm likely doing wrong during conversion?

I don't think so, this looks like correct usage (just for reference I'm the one responsible for that conversion script).

It'll copy over the tensors but use the metadata and vocabulary from the original model. The tool doesn't make any changes to the tensors and the metadata/vocab part just uses the code in convert.py so as far as model parameters and vocab it should be the same as converting it from the original HF or PyTorch model. What commit of llama.cpp are you using the script from?

edit: Thanks @person4268 for testing. Based on what they said, I'd recommend trying to run llama.cpp directly against the model file that's giving you issues and see if you're still able to replicate the problem.

1 reply

JohanAR Aug 27, 2023
Author

I was on 730d9c6 (latest main at the time of posting), but I'll do some more testing if the issue still exists in text-generation-webui after they merged the gguf branch.

Thanks for the converter btw! I made a bash wrapper for it which automatically downloads the meta files from huggingface :) https://gist.github.com/JohanAR/3376987a1058167045483e80bb7a0215

qdacsvx · 2024-01-28T10:42:18Z

qdacsvx
Jan 28, 2024

I'm seeing occasional missing words using llava with llama.cpp. I'm at git commit 254cfefa680a5eaebda36ed8664659f42108c6ff. Mostly default settings (e.g. batch=512). I do single shot generation so I will run llava for each prompt+image combination generating 100-1000 words.

./llava-cli -m models/llava-f16-13b_q8_0.gguf --mmproj models/mmproj-model-f16.gguf --image "`find ~/Public/tostory/ -iname '*.png' -print0 | shuf -z -n1`" --temp 0.8 --top-p 0.99 --min-p 0.01 -ngl 28 --ignore-eos -n 4096 -c 1024 --top_k 4096 --repeat_penalty 1.3 --repeat_last_n 256 >>~/Public/stories.txt 2>>dump

(I've also seen a missing word for --temp 0.5 --top-p 0.99 --min-p 0.01)
The output sometimes has a missing word. While this is not unknown for LLMs and it is not every output, but the frequency seems higher than I would expect.

I'm using a converted model: llava-f16-13b_q8_0.gguf (converted to q8) from:
https://huggingface.co/mys/ggml_llava-v1.5-13b/blob/main/ggml-model-f16.gguf

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

airoboros-33b-gpt4-2.0.ggmlv3.q5_K_M converted to GGUF "loses" some words #2822

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

airoboros-33b-gpt4-2.0.ggmlv3.q5_K_M converted to GGUF "loses" some words #2822

Uh oh!

JohanAR Aug 26, 2023

Replies: 3 comments · 4 replies

Uh oh!

person4268 Aug 27, 2023

Uh oh!

Uh oh!

person4268 Aug 27, 2023

Uh oh!

JohanAR Aug 27, 2023 Author

Uh oh!

person4268 Aug 28, 2023

Uh oh!

Uh oh!

KerfuffleV2 Aug 27, 2023 Collaborator

Uh oh!

JohanAR Aug 27, 2023 Author

Uh oh!

qdacsvx Jan 28, 2024

JohanAR
Aug 26, 2023

Replies: 3 comments 4 replies

person4268
Aug 27, 2023

JohanAR Aug 27, 2023
Author

KerfuffleV2
Aug 27, 2023
Collaborator

JohanAR Aug 27, 2023
Author

qdacsvx
Jan 28, 2024