Feature Request: Multiple Workers on one Machine #193

MichaelFomenko · 2025-04-08T03:08:52Z

Hello, I have following problem.

I'm encountering a performance bottleneck in my worker cluster. I currently have 4 workers distributed across multiple machines, but one worker is significantly slower than the others. This slow worker is impacting the performance of all the machines in the cluster.

I'm exploring ways to optimize resource utilization and avoid this bottleneck. Specifically, I'm wondering if it’s possible to:

Concentrate multiple workers on a single, more powerful machine, while running fewer workers on another machine?
Distribute workers across different hardware, such as running 6 workers on an RTX 3060 GPU, 1 worker on an integrated GPU, and 1 on the CPU?
My goal is to fully utilize the resources available on each machine and eliminate the bottleneck caused by the slowest worker, allowing all machines to operate efficiently.

Thank you for your great Work.

b4rtaz · 2025-04-08T21:34:23Z

Now theoretically is possible to run Distributed Llama with Vulkan on multiple GPUs (here is described how to run on Nvidia GPU in Colab, some commands may be usefull). You need to run the root node and multiple workers on the same machine with different --gpu-index <index> argument. I didn't test it yet, but it's worth a try.

Also a combination of CPU and GPU should be possible. If you have built Distributed Llama with Vulkan support enabled, you need to run the root node withhout --gpu-index <index>, and workers with --gpu-index <index>. Please notice that you cac run Distributed Llama only on 1, 2, 4... 2^n nodes.

BTW: you can print available GPUs by executing: vulkaninfo --summary

MichaelFomenko · 2025-04-09T04:42:39Z

You not understood my Feature Request, its not about running one worker per Device, its about running multiple workers on one CPU or one GPU. Because one GPU or CPU is twice or multiple times more powerful than other CPUs or GPU in the cluster and this becoming an Bottleneck.

b4rtaz · 2025-04-09T06:55:31Z

Currently what you can do is to run for example 3 workers on the same gpu (you need to run 3 instances with the same --gpu-index) and 1 on cpu (without --gpu-index). Currently is not possible to assign to CPU a bigger slice of neural network. So you can try in this way.

b4rtaz · 2025-04-09T20:47:59Z

@MichaelFomenko check 0.12.3 version, it fixes some problem on NVIDIA GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Multiple Workers on one Machine #193

Feature Request: Multiple Workers on one Machine #193

MichaelFomenko commented Apr 8, 2025 •

edited

Loading

b4rtaz commented Apr 8, 2025 •

edited

Loading

MichaelFomenko commented Apr 9, 2025

b4rtaz commented Apr 9, 2025

b4rtaz commented Apr 9, 2025

Feature Request: Multiple Workers on one Machine #193

Feature Request: Multiple Workers on one Machine #193

Comments

MichaelFomenko commented Apr 8, 2025 • edited Loading

b4rtaz commented Apr 8, 2025 • edited Loading

MichaelFomenko commented Apr 9, 2025

b4rtaz commented Apr 9, 2025

b4rtaz commented Apr 9, 2025

MichaelFomenko commented Apr 8, 2025 •

edited

Loading

b4rtaz commented Apr 8, 2025 •

edited

Loading