Skip to content

Garbage out of Whisper.cpp with Vulkan and quantitatized (not FP16) models #3047

Open
@jagusztinl

Description

@jagusztinl

In case of Q8/Q5 models the output is garbage:
.\whisper-cli.exe -m ....\models\ggml-large-v3-turbo-q8_0.bin -f "out.mp3" -l en -of meeting -otxt
whisper_init_from_file_with_params_no_state: loading model from '....\models\ggml-large-v3-turbo-q8_0.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 7
whisper_model_load: qntvr = 2
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
whisper_model_load: Vulkan0 total size = 873.55 MB
whisper_model_load: model size = 873.55 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_init_state: kv self size = 10.49 MB
whisper_init_state: kv cross size = 31.46 MB
whisper_init_state: kv pad size = 7.86 MB
whisper_init_state: compute buffer (conv) = 37.67 MB
whisper_init_state: compute buffer (encode) = 212.29 MB
whisper_init_state: compute buffer (cross) = 9.25 MB
whisper_init_state: compute buffer (decode) = 100.03 MB

system_info: n_threads = 4 / 14 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

main: processing 'out.mp3' (67817473 samples, 4238.6 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:03.160] and
[00:00:03.160 --> 00:00:05.660] ,
[00:00:05.660 --> 00:00:11.780] ,
[00:00:11.780 --> 00:00:27.180] ,

The same with FP16 model:
.\whisper-cli.exe -m ....\models\ggml-large-v3-turbo.bin -f "out.mp3" -l en -of meeting -otxt -t 6
whisper_init_from_file_with_params_no_state: loading model from '....\models\ggml-large-v3-turbo.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
whisper_model_load: Vulkan0 total size = 1623.92 MB
whisper_model_load: model size = 1623.92 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_init_state: kv self size = 10.49 MB
whisper_init_state: kv cross size = 31.46 MB
whisper_init_state: kv pad size = 7.86 MB
whisper_init_state: compute buffer (conv) = 37.67 MB
whisper_init_state: compute buffer (encode) = 212.29 MB
whisper_init_state: compute buffer (cross) = 9.25 MB
whisper_init_state: compute buffer (decode) = 100.03 MB

system_info: n_threads = 6 / 14 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

main: processing 'out.mp3' (67817473 samples, 4238.6 sec), 6 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:07.680] Talking about ecosystem from infrastructure architecture...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions