Skip to content

Garbage out of Whisper.cpp with Vulkan and quantitatized (not FP16) models #3047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jagusztinl opened this issue Apr 14, 2025 · 0 comments

Comments

@jagusztinl
Copy link

In case of Q8/Q5 models the output is garbage:
.\whisper-cli.exe -m ....\models\ggml-large-v3-turbo-q8_0.bin -f "out.mp3" -l en -of meeting -otxt
whisper_init_from_file_with_params_no_state: loading model from '....\models\ggml-large-v3-turbo-q8_0.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 7
whisper_model_load: qntvr = 2
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
whisper_model_load: Vulkan0 total size = 873.55 MB
whisper_model_load: model size = 873.55 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_init_state: kv self size = 10.49 MB
whisper_init_state: kv cross size = 31.46 MB
whisper_init_state: kv pad size = 7.86 MB
whisper_init_state: compute buffer (conv) = 37.67 MB
whisper_init_state: compute buffer (encode) = 212.29 MB
whisper_init_state: compute buffer (cross) = 9.25 MB
whisper_init_state: compute buffer (decode) = 100.03 MB

system_info: n_threads = 4 / 14 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

main: processing 'out.mp3' (67817473 samples, 4238.6 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:03.160] and
[00:00:03.160 --> 00:00:05.660] ,
[00:00:05.660 --> 00:00:11.780] ,
[00:00:11.780 --> 00:00:27.180] ,

The same with FP16 model:
.\whisper-cli.exe -m ....\models\ggml-large-v3-turbo.bin -f "out.mp3" -l en -of meeting -otxt -t 6
whisper_init_from_file_with_params_no_state: loading model from '....\models\ggml-large-v3-turbo.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
whisper_model_load: Vulkan0 total size = 1623.92 MB
whisper_model_load: model size = 1623.92 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_init_state: kv self size = 10.49 MB
whisper_init_state: kv cross size = 31.46 MB
whisper_init_state: kv pad size = 7.86 MB
whisper_init_state: compute buffer (conv) = 37.67 MB
whisper_init_state: compute buffer (encode) = 212.29 MB
whisper_init_state: compute buffer (cross) = 9.25 MB
whisper_init_state: compute buffer (decode) = 100.03 MB

system_info: n_threads = 6 / 14 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

main: processing 'out.mp3' (67817473 samples, 4238.6 sec), 6 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:07.680] Talking about ecosystem from infrastructure architecture...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant