Replies: 1 comment
-
I'm seeing these errors as well despite having plenty of available memory. In my case I was able to narrow it down to a specific situation in which two threads both want to use vulkan at the same time. It throws a segmentation fault whenever whisper.cpp tries to run while llama.cpp is still decoding |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Are there any limitations to the size and type of the models supported by Vulkan on Windows ?
Context:
I am unable to load a Llama3-8b-Q4_K_M.gguf model with Vulkan on a NVIDIA 4070M (8gb) and also AMD 780M with 24Gb available. The model loads fine on CUDA and HIP, but fails on Vulkan with:
load_tensors: layer 29 assigned to device Vulkan0
load_tensors: layer 30 assigned to device Vulkan0
load_tensors: layer 31 assigned to device Vulkan0
load_tensors: layer 32 assigned to device Vulkan0
load_tensors: tensor 'token_embd.weight' (q4_K) (and 0 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
ggml_vulkan: Device memory allocation of size 392101888 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_model_load: error loading model: unable to allocate Vulkan0 buffer
llama_model_load_from_file_impl: failed to load model
Vulkan info returns this for the NVIDIA 4070:
memoryHeaps: count = 2
memoryHeaps[0]:
size = 8334082048 (0x1f0c00000) (7.76 GiB)
budget = 7528775680 (0x1c0c00000) (7.01 GiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT
memoryHeaps[1]:
size = 25334833152 (0x5e612e000) (23.59 GiB)
budget = 24529528832 (0x5b612e800) (22.84 GiB)
usage = 278528 (0x00044000) (272.00 KiB)
flags:
None
and this for AMD 780M:
memoryHeaps: count = 3
memoryHeaps[0]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 255013680 (0x0f333330) (243.20 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 2
MEMORY_HEAP_DEVICE_LOCAL_BIT
MEMORY_HEAP_MULTI_INSTANCE_BIT
memoryHeaps[1]:
size = 25066209280 (0x5d6100000) (23.34 GiB)
budget = 23812898816 (0x58b5c0000) (22.18 GiB)
usage = 0 (0x00000000) (0.00 B)
flags:
None
memoryHeaps[2]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 255013680 (0x0f333330) (243.20 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 2
MEMORY_HEAP_DEVICE_LOCAL_BIT
MEMORY_HEAP_MULTI_INSTANCE_BIT
Any insights appreciated ...
Beta Was this translation helpful? Give feedback.
All reactions