We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In case of Q8/Q5 models the output is garbage: .\whisper-cli.exe -m ....\models\ggml-large-v3-turbo-q8_0.bin -f "out.mp3" -l en -of meeting -otxt whisper_init_from_file_with_params_no_state: loading model from '....\models\ggml-large-v3-turbo-q8_0.bin' whisper_init_with_params_no_state: use gpu = 1 whisper_init_with_params_no_state: flash attn = 0 whisper_init_with_params_no_state: gpu_device = 0 whisper_init_with_params_no_state: dtw = 0 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Intel(R) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none whisper_init_with_params_no_state: devices = 2 whisper_init_with_params_no_state: backends = 2 whisper_model_load: loading model whisper_model_load: n_vocab = 51866 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1280 whisper_model_load: n_audio_head = 20 whisper_model_load: n_audio_layer = 32 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1280 whisper_model_load: n_text_head = 20 whisper_model_load: n_text_layer = 4 whisper_model_load: n_mels = 128 whisper_model_load: ftype = 7 whisper_model_load: qntvr = 2 whisper_model_load: type = 5 (large v3) whisper_model_load: adding 1609 extra tokens whisper_model_load: n_langs = 100 whisper_model_load: Vulkan0 total size = 873.55 MB whisper_model_load: model size = 873.55 MB whisper_backend_init_gpu: using Vulkan0 backend whisper_init_state: kv self size = 10.49 MB whisper_init_state: kv cross size = 31.46 MB whisper_init_state: kv pad size = 7.86 MB whisper_init_state: compute buffer (conv) = 37.67 MB whisper_init_state: compute buffer (encode) = 212.29 MB whisper_init_state: compute buffer (cross) = 9.25 MB whisper_init_state: compute buffer (decode) = 100.03 MB
system_info: n_threads = 4 / 14 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
main: processing 'out.mp3' (67817473 samples, 4238.6 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:03.160] and [00:00:03.160 --> 00:00:05.660] , [00:00:05.660 --> 00:00:11.780] , [00:00:11.780 --> 00:00:27.180] ,
The same with FP16 model: .\whisper-cli.exe -m ....\models\ggml-large-v3-turbo.bin -f "out.mp3" -l en -of meeting -otxt -t 6 whisper_init_from_file_with_params_no_state: loading model from '....\models\ggml-large-v3-turbo.bin' whisper_init_with_params_no_state: use gpu = 1 whisper_init_with_params_no_state: flash attn = 0 whisper_init_with_params_no_state: gpu_device = 0 whisper_init_with_params_no_state: dtw = 0 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Intel(R) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none whisper_init_with_params_no_state: devices = 2 whisper_init_with_params_no_state: backends = 2 whisper_model_load: loading model whisper_model_load: n_vocab = 51866 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1280 whisper_model_load: n_audio_head = 20 whisper_model_load: n_audio_layer = 32 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1280 whisper_model_load: n_text_head = 20 whisper_model_load: n_text_layer = 4 whisper_model_load: n_mels = 128 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 5 (large v3) whisper_model_load: adding 1609 extra tokens whisper_model_load: n_langs = 100 whisper_model_load: Vulkan0 total size = 1623.92 MB whisper_model_load: model size = 1623.92 MB whisper_backend_init_gpu: using Vulkan0 backend whisper_init_state: kv self size = 10.49 MB whisper_init_state: kv cross size = 31.46 MB whisper_init_state: kv pad size = 7.86 MB whisper_init_state: compute buffer (conv) = 37.67 MB whisper_init_state: compute buffer (encode) = 212.29 MB whisper_init_state: compute buffer (cross) = 9.25 MB whisper_init_state: compute buffer (decode) = 100.03 MB
system_info: n_threads = 6 / 14 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
main: processing 'out.mp3' (67817473 samples, 4238.6 sec), 6 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:07.680] Talking about ecosystem from infrastructure architecture...
The text was updated successfully, but these errors were encountered:
No branches or pull requests
In case of Q8/Q5 models the output is garbage:
.\whisper-cli.exe -m ....\models\ggml-large-v3-turbo-q8_0.bin -f "out.mp3" -l en -of meeting -otxt
whisper_init_from_file_with_params_no_state: loading model from '....\models\ggml-large-v3-turbo-q8_0.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 7
whisper_model_load: qntvr = 2
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
whisper_model_load: Vulkan0 total size = 873.55 MB
whisper_model_load: model size = 873.55 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_init_state: kv self size = 10.49 MB
whisper_init_state: kv cross size = 31.46 MB
whisper_init_state: kv pad size = 7.86 MB
whisper_init_state: compute buffer (conv) = 37.67 MB
whisper_init_state: compute buffer (encode) = 212.29 MB
whisper_init_state: compute buffer (cross) = 9.25 MB
whisper_init_state: compute buffer (decode) = 100.03 MB
system_info: n_threads = 4 / 14 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
main: processing 'out.mp3' (67817473 samples, 4238.6 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:03.160] and
[00:00:03.160 --> 00:00:05.660] ,
[00:00:05.660 --> 00:00:11.780] ,
[00:00:11.780 --> 00:00:27.180] ,
The same with FP16 model:
.\whisper-cli.exe -m ....\models\ggml-large-v3-turbo.bin -f "out.mp3" -l en -of meeting -otxt -t 6
whisper_init_from_file_with_params_no_state: loading model from '....\models\ggml-large-v3-turbo.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
whisper_model_load: Vulkan0 total size = 1623.92 MB
whisper_model_load: model size = 1623.92 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_init_state: kv self size = 10.49 MB
whisper_init_state: kv cross size = 31.46 MB
whisper_init_state: kv pad size = 7.86 MB
whisper_init_state: compute buffer (conv) = 37.67 MB
whisper_init_state: compute buffer (encode) = 212.29 MB
whisper_init_state: compute buffer (cross) = 9.25 MB
whisper_init_state: compute buffer (decode) = 100.03 MB
system_info: n_threads = 6 / 14 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
main: processing 'out.mp3' (67817473 samples, 4238.6 sec), 6 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:07.680] Talking about ecosystem from infrastructure architecture...
The text was updated successfully, but these errors were encountered: