-
Notifications
You must be signed in to change notification settings - Fork 146
network utilization #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The test that you are referring to is very unlucky. 2 Raspberry Pi 5 devices + a cheap switch achieve 80.11 ms / token. I think you underestimate the impact of setup quality on the result. In the google cloud I achieved 8.56 ms / token (Llama 7B Q40), but there is probably something faster than Gigabit Ethernet. The next thing is the transfer characteristics, a node calulates the result of own slice then synchronizes the result. So it looks like this: ![]() So mostly there is no any transfer, and suddenly some data to transfer appear. In this case the latency has a huge impact on the final transfer time. |
I have included this factor, the Network Bandwidth is 20 Gbps. And the result is calculated in the issue content.
This seems simplified because the If we build analysis on these statistics, then still the network bandwidth should be enough. 1Gbps>> |
I am using 'tcpdump' to capture the real speed of packets and analyze the actual bandwidth utilization. But I only have 4 Raspberry Pi and may need your help to run the experiment of 8 devices, usually achieving highest latency, and share logs. Thank you. |
Let's calculate the transfer time theoretically.
llama3 8B
The original experiment data is here.
Since the transfer is full-duplex, there's no interference between uplink and downlink.
So, we can choose the bigger
510 kB
as the transfer data volume to calculate the transfer time.So, the average transfer time should be 4.08ms. However, your result is
199.60 ms
, 50 times higher.So, the network utilization ratio is merely 2%.
llama2 7B
For comparison, I summarize a similar model (llama2 7B) using different devices:

VMs
In this discussion, the Network Bandwidth is

20 Gbps
, reference here.So, the network utilization ratio is merely 3%.
Similarly, we can calculate the result of 4 VMs to be 6%.
RaspberryPi
Also, the result of the Raspberry Pi cluster is calculated to be 9.0%, 48.0%, 14.1% for 2,4,8 devices.
23.9%,25.75%, 9.8%
8.5%
Summary
I think the network utilization, average around
11%
, ranging from2%
to48%
, is under-optimized.Developing the code possibly ensures a stable and high network utilization.
Originally posted by @zhengpeirong in #41 (reply in thread)
The text was updated successfully, but these errors were encountered: