Skip to content

Commit 8f81493

Browse files
committed
Add more detaled descriptions and correct sample configurations
1 parent c414c6b commit 8f81493

File tree

8 files changed

+140
-72
lines changed

8 files changed

+140
-72
lines changed

README.md

Lines changed: 110 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,111 @@
1-
# ns3-rdma
2-
NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer switch
1+
# NS-3 simulator for RDMA
2+
This is an NS-3 simulator for RDMA over Converged Ethernet v2 (RoCEv2). It includes the implementation of DCQCN, TIMELY, PFC, ECN and Broadcom shared buffer switch.
33

4-
# Note
5-
TIMELY module has not been merged into this yet. We are working on merging it. We will also add descriptions for this project soon.
4+
It is based on NS-3 version 3.17, and ported to Visual Studio environment, as explained [here](https://www.nsnam.org/wiki/Ns-3_on_Visual_Studio_2012).
5+
6+
## Note
7+
TIMELY module has not been merged into this yet. We are working on merging it.
8+
9+
## Quick Start
10+
11+
### Build
12+
To compile it out-of-the-box, you need Visual Studio.
13+
People have successfully built it with *free* version,
14+
which can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=48146).
15+
Open windows/ns-3-dev/ns-3-dev.sln, just build the whole solution.
16+
17+
You may try building it with the original Makefile, etc. We have done it a while back, but now you probably need to edit a few things to make it work.
18+
19+
### Run
20+
The binary will be generated at windows/ns-3-dev/x64/Release/main.exe.
21+
We include a sample configuration file at windows/ns-3-dev/x64/Release/mix/config.txt
22+
Execute main.exe in windows/ns-3-dev/x64/Release/:
23+
```
24+
cd windows\ns-3-dev\x64\Release\
25+
main.exe mix\config.txt
26+
```
27+
28+
It runs a 2:1 incast at 40Gbps for 1 second. Please allow a few minutes for it to finish.
29+
The trace will be generated at mix/mix.tr, as defined by mix/config.txt
30+
31+
There are quite a few options in mix/config.txt. We will gradually add documentation.
32+
For your own convenience you can just check the code,
33+
project "main" -- source files -- "third.cc", and see how these options are parsed.
34+
You can also raise issues if you have any questions.
35+
36+
## What did we add exactly?
37+
38+
**point-to-point/model/qbb-net-device.cc** and all other qbb-* files:
39+
40+
DCQCN and PFC implementation.
41+
It also includes go-back-to-N and go-back-to-0 that handle packet drop due to corruption.
42+
43+
In 2013, we got a very basic NS-3 PFC implementation somewhere, and developed based on it.
44+
We cannot find the original repository anymore.
45+
46+
**network/model/broadcom-node.cc** and **.h**:
47+
48+
This implements a Broadcom ASIC switch model, which
49+
is mostly doing all kinds of buffer threshold-related operations. These include deciding
50+
whether PFC should be triggered, ECN should be marked, buffer is too full so packets should
51+
be dropped, etc. It supports both static and dynamic thresholds for PFC.
52+
53+
*Disclaim: this module is purely based on authors' personal understanding of Broadcom ASIC. It does not reflect any official confirmation from either Microsoft or Broadcom.*
54+
55+
**network/utils/broadcom-egress-queue.cc** and **.h**:
56+
57+
This is the actual MMU buffering packets.
58+
It also includes switch scheduler, i.e., when upper layer ask for a packet to send, it will
59+
decide which queue to be dequeued. Strategies like strict priority and round robin are supported.
60+
61+
**applications/model/udp-echo-client.cc**:
62+
63+
We implement the RDMA client here, which aligns
64+
with the fact that RoCEv2 includes UDP header. In particular, original UDP client has troubles
65+
when PFC pause the link. Original UDP client keeps sending packets at line rate, soon
66+
it builds up huge queue and memory runs out. Here we throttle the sending rate if it gets
67+
pushed back by PFC.
68+
69+
**internet/model/seq-ts-header.cc** and **.h**:
70+
71+
We didn't implement the full InfiniBand
72+
header. Instead, what we really need is just the sequence number (for detecting corruption
73+
drops, and also help us understand the throughput) and timestamp (required by TIMELY.)
74+
This is where we encode this information into packets.
75+
76+
**main/third.cc**:
77+
78+
The main() function.
79+
80+
There may be other edits here and there, especially the trace generation is scattered
81+
among various network stacks. But above are the major ones.
82+
83+
## Q&A
84+
85+
**Q: Why do you port it to Windows?**
86+
87+
A: This is a Microsoft project. Visual Studio, including the free version, works well.
88+
89+
**Q: Fine. What if I want to run it on Linux, and do not want to spend time changing the build process?**
90+
91+
A: You can build it using Visual Studio and run the .exe using WINE. We have tested WINE 1.6.2 and it works well.
92+
93+
**Q: I don't understand ... (some part of the code or configuration)**
94+
95+
A: Raise issues on GitHub, so that your questions can also help others. If you really do
96+
not want others know you are working on this, you can email yibzh@microsoft.com
97+
98+
**Q: What papers should I cite, if I also publish?**
99+
100+
A: Below are the ones you should definitely check. They are ranked from most relevant to
101+
less. That said, all of them are quite relevant:
102+
103+
*ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY*, CoNEXT'16 (this project is released with this paper, we ask you to at least cite this paper if you use this code.)
104+
105+
*Congestion Control for Large-scale RDMA Deployments*, SIGCOMM'15 (DCQCN)
106+
107+
*TIMELY: RTT-based Congestion Control for the Datacenter*, SIGCOMM'15 (TIMELY)
108+
109+
*RDMA over Commodity Ethernet at Scale*, SIGCOMM'16 (discussed go-back-to-N)
110+
111+
*Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them*, HotNets'16 (PFC deadlock analysis, directly used this simulator.)

windows/ns-3-dev/mix/flow_tcp.txt

Lines changed: 0 additions & 24 deletions
This file was deleted.

windows/ns-3-dev/mix/topology.txt

Lines changed: 0 additions & 29 deletions
This file was deleted.
Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,39 @@
11
ENABLE_QCN 1
22
USE_DYNAMIC_PFC_THRESHOLD 1
33
PACKET_LEVEL_ECMP 0
4-
FLOW_LEVEL_ECMP 0
4+
FLOW_LEVEL_ECMP 1
55

66
PAUSE_TIME 5
77
PACKET_PAYLOAD_SIZE 1000
88

9-
TOPOLOGY_FILE C:\ns-3-win2\windows\ns-3-dev\mix\topology.txt
10-
FLOW_FILE C:\ns-3-win2\windows\ns-3-dev\mix\flow.txt
11-
TCP_FLOW_FILE C:\ns-3-win2\windows\ns-3-dev\mix\flow_tcp.txt
12-
TRACE_FILE C:\ns-3-win2\windows\ns-3-dev\mix\trace.txt
13-
TRACE_OUTPUT_FILE Z:\mix.tr
9+
TOPOLOGY_FILE mix/topology.txt
10+
FLOW_FILE mix/flow.txt
11+
TCP_FLOW_FILE mix/flow_tcp_0.txt
12+
TRACE_FILE mix/trace.txt
13+
TRACE_OUTPUT_FILE mix/mix.tr
1414

1515
SEND_IN_CHUNKS 0
1616
APP_START_TIME 1.0
1717
APP_STOP_TIME 10.0
18-
SIMULATOR_STOP_TIME 2.05
18+
SIMULATOR_STOP_TIME 3.01
1919

2020
CNP_INTERVAL 50
2121
ALPHA_RESUME_INTERVAL 55
2222
NP_SAMPLING_INTERVAL 0
2323
CLAMP_TARGET_RATE 1
2424
CLAMP_TARGET_RATE_AFTER_TIMER 0
2525
RP_TIMER 60
26-
BYTE_COUNTER 300000
26+
BYTE_COUNTER 300000000
2727
DCTCP_GAIN 0.00390625
28-
KMAX 1000
29-
KMIN 40
30-
PMAX 1.0
28+
KMAX 1000
29+
KMIN 40
30+
PMAX 1.0
3131
FAST_RECOVERY_TIMES 5
3232
RATE_AI 40Mb/s
3333
RATE_HAI 200Mb/s
34+
35+
ERROR_RATE_PER_LINK 0.0000
36+
L2_CHUNK_SIZE 4000
37+
L2_WAIT_FOR_ACK 0
38+
L2_ACK_INTERVAL 256
39+
L2_BACK_TO_ZERO 0
Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
1-
4
1+
2
22
2 1 3 10000000 2.0 9.5
33
3 1 3 10000000 2.0 9.5
4-
4 1 3 10000000 2.0 9.5
5-
5 1 3 10000000 2.0 9.5
64

75

86
First line: flow #
9-
src dst pg packet#
7+
src dst priority packet# start_time end_time
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
4 1 3
2+
0
3+
0 1 40Gbps 0.001ms 0
4+
0 2 40Gbps 0.001ms 0
5+
0 3 40Gbps 0.001ms 0
6+
7+
First line: total node #, switch node #, link #
8+
Second line: switch node IDs...
9+
src0 dst0 rate delay error_rate
10+
src1 dst1 rate delay error_rate
11+
...

0 commit comments

Comments
 (0)