Multiple GPUs (uneven split) #6046

Lyrcaxis · 2024-03-13T21:20:25Z

Lyrcaxis
Mar 13, 2024

Hi. I have a 3090 (24GB) and a 4080 (16GB) on my home, and thought I should try combining them to run bigger models.

I went to aphrodite & vllm first since there are supposedly the go-tos for multi-GPU distribution, but both of them assume all GPUs have the same amount of VRAM available, so models won't load if I try to utilize them.

Does llama.cpp support uneven split of GBs/layers between multiple GPUs?
(I have slow-ish internet connection so it took ages to DL a big AWQ model. Thought I'd ask here before downloading a GGUF version.)

slaren · 2024-03-13T21:25:50Z

slaren
Mar 13, 2024
Maintainer

Yes, by default llama.cpp will offload to each GPU a fraction of the model proportional to the amount of free memory available on the GPU, but you can also configure a different split with -ts.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple GPUs (uneven split) #6046

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multiple GPUs (uneven split) #6046

Uh oh!

Lyrcaxis Mar 13, 2024

Replies: 1 comment

Uh oh!

slaren Mar 13, 2024 Maintainer

Lyrcaxis
Mar 13, 2024

slaren
Mar 13, 2024
Maintainer