Llama multi GPU #3804
PaulaScholz
started this conversation in
Show and tell
Llama multi GPU
#3804
Replies: 2 comments
-
Something changed within the last week. I see this too on my 3x P40 setup, it is trying to utilize GPU0 almost by itself and I eventually get an OOM on the first prompt. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have Llama2 running under LlamaSharp (latest drop, 10/26) and CUDA-12. I took a screen capture of the Task Manager running while the model was answering questions and thought I'd provide you the feedback. There are 4 A6000 GPUs on the system with 128GB of system ram. It works, and also loads and runs the 70b models (albeit a bit more slowly). Though it does use all the GPUs, it mostly puts the burden on GPU0.
I wanted to upload a larger video file, but the limit is 10mb.
GPUPerf_Llama2_13bModel.mp4
Beta Was this translation helpful? Give feedback.
All reactions