Skip to content

Feature Request: Multiple Workers on one Machine #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MichaelFomenko opened this issue Apr 8, 2025 · 4 comments
Open

Feature Request: Multiple Workers on one Machine #193

MichaelFomenko opened this issue Apr 8, 2025 · 4 comments

Comments

@MichaelFomenko
Copy link

MichaelFomenko commented Apr 8, 2025

Hello, I have following problem.

I'm encountering a performance bottleneck in my worker cluster. I currently have 4 workers distributed across multiple machines, but one worker is significantly slower than the others. This slow worker is impacting the performance of all the machines in the cluster.

I'm exploring ways to optimize resource utilization and avoid this bottleneck. Specifically, I'm wondering if it’s possible to:

Concentrate multiple workers on a single, more powerful machine, while running fewer workers on another machine?
Distribute workers across different hardware, such as running 6 workers on an RTX 3060 GPU, 1 worker on an integrated GPU, and 1 on the CPU?
My goal is to fully utilize the resources available on each machine and eliminate the bottleneck caused by the slowest worker, allowing all machines to operate efficiently.

Thank you for your great Work.

@b4rtaz
Copy link
Owner

b4rtaz commented Apr 8, 2025

Now theoretically is possible to run Distributed Llama with Vulkan on multiple GPUs (here is described how to run on Nvidia GPU in Colab, some commands may be usefull). You need to run the root node and multiple workers on the same machine with different --gpu-index <index> argument. I didn't test it yet, but it's worth a try.

Also a combination of CPU and GPU should be possible. If you have built Distributed Llama with Vulkan support enabled, you need to run the root node withhout --gpu-index <index>, and workers with --gpu-index <index>. Please notice that you cac run Distributed Llama only on 1, 2, 4... 2^n nodes.

BTW: you can print available GPUs by executing: vulkaninfo --summary

@MichaelFomenko
Copy link
Author

You not understood my Feature Request, its not about running one worker per Device, its about running multiple workers on one CPU or one GPU. Because one GPU or CPU is twice or multiple times more powerful than other CPUs or GPU in the cluster and this becoming an Bottleneck.

@b4rtaz
Copy link
Owner

b4rtaz commented Apr 9, 2025

Currently what you can do is to run for example 3 workers on the same gpu (you need to run 3 instances with the same --gpu-index) and 1 on cpu (without --gpu-index). Currently is not possible to assign to CPU a bigger slice of neural network. So you can try in this way.

@b4rtaz
Copy link
Owner

b4rtaz commented Apr 9, 2025

@MichaelFomenko check 0.12.3 version, it fixes some problem on NVIDIA GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants