Skip to content

whisper.cpp doesn't run #537

@alosslessdev

Description

@alosslessdev

zluda_trace logs (tarball/zip file)

log.zip

Description

Whisper.cpp when compiled with cuda support doesn't work with zluda. It says the CUDA driver version is insufficient for CUDA runtime version but it works without zluda on an nvidia card

whisper_init_from_file_with_params_no_state: loading model from '/home/asdf/Downloads/whisper.cpp-master/models/ggml-base.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_cuda_init: failed to initialize CUDA: CUDA driver version is insufficient for CUDA runtime version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 3060 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 3
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      Vulkan0 total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   17.24 MB
whisper_init_state: compute buffer (encode) =   85.88 MB
whisper_init_state: compute buffer (cross)  =    4.66 MB
whisper_init_state: compute buffer (decode) =   97.29 MB

Steps to reproduce

from whisper.cpp:

First clone the repository:

git clone https://github.com/ggml-org/whisper.cpp.git

Navigate into the directory:

cd whisper.cpp

Then, download one of the Whisper models converted in ggml format. For example:

sh ./models/download-ggml-model.sh base.en

Now build the whisper-cli example and transcribe an audio file like this:

# build the project
cmake --fresh -B build -D WHISPER_FFMPEG=yes -DGGML_CUDA=1 -DGGML_VULKAN=1
cmake --build build -j --config Release

# transcribe an audio file
./build/bin/whisper-cli -f samples/jfk.wav

ZLUDA version

5

Operating System

endeavour os rolling

GPU

AMD Radeon RX Vega 7

Metadata

Metadata

Assignees

No one assigned

    Labels

    zluda_trace logszluda_trace log files for a particular application

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions