Running single model on multiple cpu servers for faster inference? #3452
sidharthiimc
started this conversation in
Ideas
Replies: 1 comment
-
MPI is currently being reworked, see #3334 Though network throughput will always be a bottleneck in such configurations, and it's likely that meaningful gains will only happen for really big models. For small models, synchronization between nodes takes significantly more time than the computation itself, so nodes end up just mostly waiting. I have a bunch of Linux servers too, and so far, single host inference still wins. I'm currently hunting for 10gbit or FC cards to see if it improves the situation, if at some point I manage to get my hands on some, I'll let you know if that makes clustering more usable. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have access to multiple linux servers. Is there anyway I can run one model and these servers and speed up the generation process? [Not referring to load distribution in terms of call but distribution of layers across servers may help]
Has anyone tried anything on this?
Beta Was this translation helpful? Give feedback.
All reactions