|
1 |
| -# ns3-rdma |
2 |
| -NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer switch |
| 1 | +# NS-3 simulator for RDMA |
| 2 | +This is an NS-3 simulator for RDMA over Converged Ethernet v2 (RoCEv2). It includes the implementation of DCQCN, TIMELY, PFC, ECN and Broadcom shared buffer switch. |
3 | 3 |
|
4 |
| -# Note |
5 |
| -TIMELY module has not been merged into this yet. We are working on merging it. We will also add descriptions for this project soon. |
| 4 | +It is based on NS-3 version 3.17, and ported to Visual Studio environment, as explained [here](https://www.nsnam.org/wiki/Ns-3_on_Visual_Studio_2012). |
| 5 | + |
| 6 | +## Note |
| 7 | +TIMELY module has not been merged into this yet. We are working on merging it. |
| 8 | + |
| 9 | +## Quick Start |
| 10 | + |
| 11 | +### Build |
| 12 | +To compile it out-of-the-box, you need Visual Studio. |
| 13 | +People have successfully built it with *free* version, |
| 14 | +which can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=48146). |
| 15 | +Open windows/ns-3-dev/ns-3-dev.sln, just build the whole solution. |
| 16 | + |
| 17 | +You may try building it with the original Makefile, etc. We have done it a while back, but now you probably need to edit a few things to make it work. |
| 18 | + |
| 19 | +### Run |
| 20 | +The binary will be generated at windows/ns-3-dev/x64/Release/main.exe. |
| 21 | +We include a sample configuration file at windows/ns-3-dev/x64/Release/mix/config.txt |
| 22 | +Execute main.exe in windows/ns-3-dev/x64/Release/: |
| 23 | +``` |
| 24 | +cd windows\ns-3-dev\x64\Release\ |
| 25 | +main.exe mix\config.txt |
| 26 | +``` |
| 27 | + |
| 28 | +It runs a 2:1 incast at 40Gbps for 1 second. Please allow a few minutes for it to finish. |
| 29 | +The trace will be generated at mix/mix.tr, as defined by mix/config.txt |
| 30 | + |
| 31 | +There are quite a few options in mix/config.txt. We will gradually add documentation. |
| 32 | +For your own convenience you can just check the code, |
| 33 | +project "main" -- source files -- "third.cc", and see how these options are parsed. |
| 34 | +You can also raise issues if you have any questions. |
| 35 | + |
| 36 | +## What did we add exactly? |
| 37 | + |
| 38 | +**point-to-point/model/qbb-net-device.cc** and all other qbb-* files: |
| 39 | + |
| 40 | +DCQCN and PFC implementation. |
| 41 | +It also includes go-back-to-N and go-back-to-0 that handle packet drop due to corruption. |
| 42 | + |
| 43 | +In 2013, we got a very basic NS-3 PFC implementation somewhere, and developed based on it. |
| 44 | +We cannot find the original repository anymore. |
| 45 | + |
| 46 | +**network/model/broadcom-node.cc** and **.h**: |
| 47 | + |
| 48 | +This implements a Broadcom ASIC switch model, which |
| 49 | +is mostly doing all kinds of buffer threshold-related operations. These include deciding |
| 50 | +whether PFC should be triggered, ECN should be marked, buffer is too full so packets should |
| 51 | +be dropped, etc. It supports both static and dynamic thresholds for PFC. |
| 52 | + |
| 53 | +*Disclaim: this module is purely based on authors' personal understanding of Broadcom ASIC. It does not reflect any official confirmation from either Microsoft or Broadcom.* |
| 54 | + |
| 55 | +**network/utils/broadcom-egress-queue.cc** and **.h**: |
| 56 | + |
| 57 | +This is the actual MMU buffering packets. |
| 58 | +It also includes switch scheduler, i.e., when upper layer ask for a packet to send, it will |
| 59 | +decide which queue to be dequeued. Strategies like strict priority and round robin are supported. |
| 60 | + |
| 61 | +**applications/model/udp-echo-client.cc**: |
| 62 | + |
| 63 | +We implement the RDMA client here, which aligns |
| 64 | +with the fact that RoCEv2 includes UDP header. In particular, original UDP client has troubles |
| 65 | +when PFC pause the link. Original UDP client keeps sending packets at line rate, soon |
| 66 | +it builds up huge queue and memory runs out. Here we throttle the sending rate if it gets |
| 67 | +pushed back by PFC. |
| 68 | + |
| 69 | +**internet/model/seq-ts-header.cc** and **.h**: |
| 70 | + |
| 71 | +We didn't implement the full InfiniBand |
| 72 | +header. Instead, what we really need is just the sequence number (for detecting corruption |
| 73 | +drops, and also help us understand the throughput) and timestamp (required by TIMELY.) |
| 74 | +This is where we encode this information into packets. |
| 75 | + |
| 76 | +**main/third.cc**: |
| 77 | + |
| 78 | +The main() function. |
| 79 | + |
| 80 | +There may be other edits here and there, especially the trace generation is scattered |
| 81 | +among various network stacks. But above are the major ones. |
| 82 | + |
| 83 | +## Q&A |
| 84 | + |
| 85 | +**Q: Why do you port it to Windows?** |
| 86 | + |
| 87 | +A: This is a Microsoft project. Visual Studio, including the free version, works well. |
| 88 | + |
| 89 | +**Q: Fine. What if I want to run it on Linux, and do not want to spend time changing the build process?** |
| 90 | + |
| 91 | +A: You can build it using Visual Studio and run the .exe using WINE. We have tested WINE 1.6.2 and it works well. |
| 92 | + |
| 93 | +**Q: I don't understand ... (some part of the code or configuration)** |
| 94 | + |
| 95 | +A: Raise issues on GitHub, so that your questions can also help others. If you really do |
| 96 | +not want others know you are working on this, you can email yibzh@microsoft.com |
| 97 | + |
| 98 | +**Q: What papers should I cite, if I also publish?** |
| 99 | + |
| 100 | +A: Below are the ones you should definitely check. They are ranked from most relevant to |
| 101 | +less. That said, all of them are quite relevant: |
| 102 | + |
| 103 | +*ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY*, CoNEXT'16 (this project is released with this paper, we ask you to at least cite this paper if you use this code.) |
| 104 | + |
| 105 | +*Congestion Control for Large-scale RDMA Deployments*, SIGCOMM'15 (DCQCN) |
| 106 | + |
| 107 | +*TIMELY: RTT-based Congestion Control for the Datacenter*, SIGCOMM'15 (TIMELY) |
| 108 | + |
| 109 | +*RDMA over Commodity Ethernet at Scale*, SIGCOMM'16 (discussed go-back-to-N) |
| 110 | + |
| 111 | +*Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them*, HotNets'16 (PFC deadlock analysis, directly used this simulator.) |
0 commit comments