-
Notifications
You must be signed in to change notification settings - Fork 146
Feature Request: Multiple Workers on one Machine #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Now theoretically is possible to run Distributed Llama with Vulkan on multiple GPUs (here is described how to run on Nvidia GPU in Colab, some commands may be usefull). You need to run the root node and multiple workers on the same machine with different Also a combination of CPU and GPU should be possible. If you have built Distributed Llama with Vulkan support enabled, you need to run the root node withhout BTW: you can print available GPUs by executing: |
You not understood my Feature Request, its not about running one worker per Device, its about running multiple workers on one CPU or one GPU. Because one GPU or CPU is twice or multiple times more powerful than other CPUs or GPU in the cluster and this becoming an Bottleneck. |
Currently what you can do is to run for example 3 workers on the same gpu (you need to run 3 instances with the same --gpu-index) and 1 on cpu (without --gpu-index). Currently is not possible to assign to CPU a bigger slice of neural network. So you can try in this way. |
@MichaelFomenko check 0.12.3 version, it fixes some problem on NVIDIA GPUs. |
Hello, I have following problem.
I'm encountering a performance bottleneck in my worker cluster. I currently have 4 workers distributed across multiple machines, but one worker is significantly slower than the others. This slow worker is impacting the performance of all the machines in the cluster.
I'm exploring ways to optimize resource utilization and avoid this bottleneck. Specifically, I'm wondering if it’s possible to:
Concentrate multiple workers on a single, more powerful machine, while running fewer workers on another machine?
Distribute workers across different hardware, such as running 6 workers on an RTX 3060 GPU, 1 worker on an integrated GPU, and 1 on the CPU?
My goal is to fully utilize the resources available on each machine and eliminate the bottleneck caused by the slowest worker, allowing all machines to operate efficiently.
Thank you for your great Work.
The text was updated successfully, but these errors were encountered: