Running single model on multiple cpu servers for faster inference? #3452

sidharthiimc · 2023-10-03T10:40:39Z

sidharthiimc
Oct 3, 2023

I have access to multiple linux servers. Is there anyway I can run one model and these servers and speed up the generation process? [Not referring to load distribution in terms of call but distribution of layers across servers may help]
Has anyone tried anything on this?

staviq · 2023-10-03T11:36:51Z

staviq
Oct 3, 2023

MPI is currently being reworked, see #3334

Though network throughput will always be a bottleneck in such configurations, and it's likely that meaningful gains will only happen for really big models.

For small models, synchronization between nodes takes significantly more time than the computation itself, so nodes end up just mostly waiting.

I have a bunch of Linux servers too, and so far, single host inference still wins.

I'm currently hunting for 10gbit or FC cards to see if it improves the situation, if at some point I manage to get my hands on some, I'll let you know if that makes clustering more usable.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running single model on multiple cpu servers for faster inference? #3452

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Running single model on multiple cpu servers for faster inference? #3452

Uh oh!

sidharthiimc Oct 3, 2023

Replies: 1 comment

Uh oh!

staviq Oct 3, 2023

sidharthiimc
Oct 3, 2023

staviq
Oct 3, 2023