One model per GPU #786
Unanswered
WaleedAlfaris
asked this question in
Q&A
Replies: 1 comment
-
Wish I had multiple gpus to test it out but have you tried main_gpu param? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I have a system with 4 CUDA enabled GPUs, each with 16GB of VRAM. I have a single api which loads the models into a pool and uses a queue system to process queries in a first in first out sequence. I am able to sucesfully run 4 llama2-7B models on this system. However, When I do this, the models are split accross the 4 GPUs automatically. Is there any way to specify which models are loaded on which devices? I would like to load each model fully onto a single GPU, having model one fully loaded on GPU 0, model 2 on GPU 1, and so on, wihtout splitting a single model accross multiple GPUs. Is this possible?
When looking online, I found the
export CUDA_VISIBLE_DEVICES=1
command, but since I am loading all the models in a single script this would limit all the models to the visible GPUs and would stil allocate them automatically. Unless there is a way to use the command in another way.Beta Was this translation helpful? Give feedback.
All reactions