-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Closed
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Description
What happened?
Hello, i'm having a problem quantizing a safetensors
model to BF16 using convert-hf-to-gguf.py
.
I can quantize any model to f16
or q8_0
, but i can't convert them to bf16
/auto
because of the error in attached log.
I'm running self-built llama.cpp
from current master (commit c70d117c37cc7876e775d1e2722208a50c52edb3
), with ROCm 6.0.2 and Python 3.12.4 @ latest Arch Linux system (RX7900XT + Ryzen 9 5900X), but i don't know if more details about my build matter here - i can provide them if necessary.
Name and Version
version: 3244 (c70d117)
built with cc (GCC) 14.1.1 20240522 for x86_64-pc-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
INFO:gguf.gguf_writer:Meta-Llama-3-8B-Instruct.auto.gguf: n_tensors = 291, total_size = 16.1G
Writing: 0%| | 0.00/16.1G [00:00<?, ?byte/s]Traceback (most recent call last):
File "/home/steelph0enix/llama.cpp/convert-hf-to-gguf.py", line 3096, in <module>
main()
File "/home/steelph0enix/llama.cpp/convert-hf-to-gguf.py", line 3091, in main
model_instance.write()
File "/home/steelph0enix/llama.cpp/convert-hf-to-gguf.py", line 333, in write
self.gguf_writer.write_tensors_to_file(progress=True)
File "/home/steelph0enix/llama.cpp/gguf-py/gguf/gguf_writer.py", line 400, in write_tensors_to_file
ti.tensor.tofile(fout)
File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 233, in tofile
eager = LazyNumpyTensor.to_eager(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 193, in to_eager
return cls._recurse_apply(t, simple_to_eager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 109, in _recurse_apply
return fn(o)
^^^^^
File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 185, in simple_to_eager
lt._data = lt._func(lt._args)
^^^^^^^^^^^^^^^^^^
File "/home/steelph0enix/llama.cpp/gguf-py/gguf/lazy.py", line 158, in <lambda>
return cls(meta=cls.eager_to_meta(res), lazy=shared_lazy, args=args, func=lambda a: fn(*a, **kwargs))
^^^^^^^^^^^^^^^^
File "/home/steelph0enix/llama.cpp/gguf-py/gguf/quants.py", line 52, in __quantize_bf16_array
return __apply_over_grouped_rows(__compute_fp32_to_bf16, arr=n, otype=np.int16, oshape=n.shape)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steelph0enix/llama.cpp/gguf-py/gguf/quants.py", line 47, in __apply_over_grouped_rows
np.concatenate([func(group).ravel() for group in np.array_split(rows, n_groups)], axis=0, out=out)
^^^^^^^^^^^
File "/home/steelph0enix/llama.cpp/gguf-py/gguf/quants.py", line 30, in __compute_fp32_to_bf16
n = np.where((n & 0x7fffffff) > 0x7f800000, (n & 0xffff0000) | (64 << 16), n)
~~^~~~~~~~~~~~
OverflowError: Python integer 4294901760 out of bounds for int32
Writing: 0%| | 0.00/16.1G [00:02<?, ?byte/s]
Error: Failed to quantize model '/run/media/steelph0enix/bydlak/LLMs/Meta-Llama-3-8B-Instruct'.
Metadata
Metadata
Assignees
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)