Is it possible to run very big llama model? #2950

2catycm · 2023-09-01T06:25:16Z

2catycm
Sep 1, 2023

big means that a single GPU is not able to hold all parameters, even after we use quantimization.
In this case, parallelism and communication is needed between GPUs

Answered by staviq

Sep 1, 2023

This already works even with multiple dissimilar GPUs, for example, I'm using rtx2070 and p106-100

In case of multiple GPUs with different amounts of VRAM, you may have to fiddle a bit with -ts parameter to fill the VRAM to the brim on all GPUs, but it works already.

Communication between GPUs is not strictly necessary. There is a PR #2470 for enabling "nvlink like" GPU to GPU communication, but people are reporting varying results, sometimes it's faster and sometimes slower.

View full answer

mirek190 · 2023-09-01T11:10:35Z

mirek190
Sep 1, 2023

yes ... you can use muti gpu.

0 replies

KerfuffleV2 · 2023-09-01T14:00:46Z

KerfuffleV2
Sep 1, 2023
Collaborator

You can also only offload part of the model to the GPU(s) and run the rest on CPU. Running a 70B LLaMa is possible on pure CPU with 64GB RAM.

0 replies

mirek190 · 2023-09-01T14:09:39Z

mirek190
Sep 1, 2023

With the newest build llamacpp and mod l 70b qk4_m version on rtx 3090 you can put 48 layers of 80 on GPU . I have then ~ 3 tokens /s .

2 replies

AbdullahArean Jan 12, 2024

Can you share the code how you got so fast response!

mirek190 Jan 13, 2024

I have ryzen 7950x3d , ram 7000 mhz and rtx 3090 ... you know biffy CPU....

staviq · 2023-09-01T15:07:16Z

staviq
Sep 1, 2023

This already works even with multiple dissimilar GPUs, for example, I'm using rtx2070 and p106-100

In case of multiple GPUs with different amounts of VRAM, you may have to fiddle a bit with -ts parameter to fill the VRAM to the brim on all GPUs, but it works already.

Communication between GPUs is not strictly necessary. There is a PR #2470 for enabling "nvlink like" GPU to GPU communication, but people are reporting varying results, sometimes it's faster and sometimes slower.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it possible to run very big llama model? #2950

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is it possible to run very big llama model? #2950

Uh oh!

2catycm Sep 1, 2023

Replies: 4 comments · 2 replies

Uh oh!

mirek190 Sep 1, 2023

Uh oh!

KerfuffleV2 Sep 1, 2023 Collaborator

Uh oh!

mirek190 Sep 1, 2023

Uh oh!

AbdullahArean Jan 12, 2024

Uh oh!

mirek190 Jan 13, 2024

Uh oh!

staviq Sep 1, 2023

2catycm
Sep 1, 2023

Replies: 4 comments 2 replies

mirek190
Sep 1, 2023

KerfuffleV2
Sep 1, 2023
Collaborator

mirek190
Sep 1, 2023

staviq
Sep 1, 2023