Does HF TGI support Multi Node -Multi GPU server set up ? #1561
Unanswered
ansSanthoshM
asked this question in
Q&A
Replies: 2 comments
-
Please let me know developers comment for this. This would help me to decide next course of action. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi! we would like to know the same! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Team,
I have two machines, each machine has 4 NVIDIA GPUs, each GPU has 4GB RAM, so each machine has 184GB of VRAM.
Two machines are made as a cluster, now the cluster has 8GPUs and total 368GB of VRAM.
Now i want to load two LLM models on these cluster 1) Llama2-70B-Chat 2)Llama2-70B-Code, Each of these LLM consume 168GB of VRAM, to load both the models i need total 336 GB of VRAM. So i am thinking to use MultiNode-MulitGPU configuration server i.e 2 nodes each node has 4 GPUs.
Is it possible to make TGI server on this cluster configuration ? So that i can create two docker container end points for each of the LLM but both share common harware.
Beta Was this translation helpful? Give feedback.
All reactions