Running multiple tiny models in parallel on a single GPU #2017

abarai-lanl · 2025-05-11T05:01:27Z

abarai-lanl
May 11, 2025

I have a Nvidia Tesla GPU with 32GB VRAM. I can instantiate 10 instances of llama-cpp-python on that single GPU with Qwen3-0.6B on different terminal sessions. My question is, do these models run in parallel if the create_chat_completion is invoked in 10 concurrently in separate terminal instances?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running multiple tiny models in parallel on a single GPU #2017

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Running multiple tiny models in parallel on a single GPU #2017

Uh oh!

Uh oh!

abarai-lanl May 11, 2025

Replies: 0 comments

abarai-lanl
May 11, 2025