Skip to content

Is it possible to run very big llama model? #2950

Answered by staviq
2catycm asked this question in Q&A
Discussion options

You must be logged in to vote

This already works even with multiple dissimilar GPUs, for example, I'm using rtx2070 and p106-100

In case of multiple GPUs with different amounts of VRAM, you may have to fiddle a bit with -ts parameter to fill the VRAM to the brim on all GPUs, but it works already.

Communication between GPUs is not strictly necessary. There is a PR #2470 for enabling "nvlink like" GPU to GPU communication, but people are reporting varying results, sometimes it's faster and sometimes slower.

Replies: 4 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@AbdullahArean
Comment options

@mirek190
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by 2catycm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants