Running multiple tiny models in parallel on a single GPU #2017
Unanswered
abarai-lanl
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a Nvidia Tesla GPU with 32GB VRAM. I can instantiate 10 instances of llama-cpp-python on that single GPU with Qwen3-0.6B on different terminal sessions. My question is, do these models run in parallel if the
create_chat_completion
is invoked in 10 concurrently in separate terminal instances?Beta Was this translation helpful? Give feedback.
All reactions