- HPCA_2024_final: Contains the results of all the simulation. This results can be used to reproduce the graphs presented in the paper.
- src: Contains al the source code
- allreduce: This subdirectory contains all the AllReduce algortims presented in the paper.
- booksim2: BookSim2 simulator source code
- SavedTrees: As MultiTree takes long time to generate trees when number of accelerators are large, we initially build the tree and load during simulation.
- SCALE-Sim: Stores code for scale-sim
- utils: Contains python and shell scripts to reproduce the results.
- python_scripts: Contains python scripts to reproduce graphs in the paper. It uses results from HPCA_2024_final directory.
- run_scripts: Contains shell scripts to run the simulations.
- Ubuntu 22.04(Other Ubuntu version should work as well)
- Python 3.7.15
- ScaleSim + BookSim(We have included both the simulators with this repository)
Create a new virtual environment
python3 -m venv venv
Activate the virtual environment
source venv/bin/activate
Install the required dependencies from requirements.txt file (If required upgrade the pip)
pip install -r requirements.txt
Run the following commands to finalize the environment setup.
source setup_env.sh
The above command should show "source successfully" message.
cd src
cd booksim2
cd src
make clean
make lib
It binds booksim with python code. Now go back to the root folder.
Run the following commands to update config files topology directory.
cd src
python update_cfg_files.py
It will update the config file by appending the root directory of your system. You need to run this just once. After that you can reuse the updated config files for your experiment. Then go back to root folder.
We are providing all the simulation output files in HPCA_2024_final folder. To reproduce the graphs from the papers, first go to the python_scripts folder
cd utils/python_scripts
- To reproduce the results for Figure 8, run the following command. All the results for bandwidth are in HPCA_2024_final/bandwidth folder. Running the following script will generate a file named bandwidth.pdf.
python plot_bandwidth.py
- To reproduce the results for Figure 9, run the following command. All the results for scalability are in HPCA_2024_final/scalability folder. Running the following script will generate a file named scalability.pdf.
python plot_scalability.py
- To reproduce the results for Figure 10, run the following command. All the results for real-world DNN models are in HPCA_2024_final/models folder. To get the training time of DNN models, we simulate the models with ScaleSim simulator.
python plot_model_even.py
will generate models_even.pdf file(Fig 10 left)python plot_model_odd.py
will generate models_odd.pdf file(Fig 10 right)
- To reproduce the results for Figure 11, run the following command. All the results for overlapping computation and communication are in HPCA_2024_final/overlap folder. Running the following script will generate a file named overlap.pdf.
python plot_overlap.py
- To reproduce the results for Figure 12, run the following command. All the results for link utilization are in HPCA_2024_final/utilization folder. Running the following script will generate a file named link_utilization.pdf. This run may take hours to finish.
python plot_link_utilization.py
- To reproduce the results for Figure 13, run the following command. All the results for simba results are in HPCA_2024_final/simba folder.
python plot_simba_results_32.py
will generate models_simba_32.pdf file(Fig 13 left)python plot_simba_results_16.py
will generate models_simba_16.pdf file(Fig 13 right)
- To reproduce the results for Figure 14, run the following command. All the results for best chunk size are in HPCA_2024_final/chunk_utilization folder. Running the following script will generate a file named chunk.pdf.
python plot_chunk_utilization.py
Below we present the name used in the code of the algorithms presented in paper.
- TTO('mesh_overlap_2d_1')
- Ring AllReduce Algorithm with Even-sized Mesh('ring')
- Ring AllReduce Algorithm with Odd-sized Mesh('ring_odd')
- Hierarchical Two-dimensional Ring AllReduce Algorithm('ring2dn')
- Double Binary Tree('dtree')
- Multitree('multitree')
- Bidirectional Ring AllReduce Algorithm with Even-sized Mesh('ring_bi')
- Bidirectional Ring AllReduce Algorithm with Odd-sized Mesh('ring_odd_bi')
We provide the scripts to reproduce all the results in utils/run_scripts folder. Note that, most of the scripts parallely run multiple simulations simultaneously. So, running them in a small core machine may take most of the resources. Running the scripts will create a folder named HPCA2024 folder in the root directory and all the outputs will be stored there. Each simulation will produce a log file and a json file where json file contains the outputs. Number of cycles for the simulation can be found in results/performance inside the json file.
- Scripts in run_scripts/bw/*.sh will reproduce results of bandwidth utilization.
- Scripts in run_scripts/models/.sh* will reproduce results of all DNN benchmark performances.
- Scripts in run_scripts/overlap/.sh* will reproduce results of communication and computation overlap.
- Scripts in run_scripts/scalability/.sh* will reproduce results of scalability.
- Scripts in run_scripts/simba/.sh* will reproduce results of all DNN benchmark performances.
- Scripts in run_scripts/chunk_sensitivity.sh will reproduce results of chunk sensitivity of TTO.
- Scripts in run_scripts/link_utilization.sh will reproduce results of link utilization.
To test TTO with AlphaGoZero, you can run the following command.
./utils/mesh_overlap_2d_1_models_test.sh
This will create a json file same as the json file stored in test_result folder. All other scripts in utils/run_scripts folder should create json file similar to the outputs provided in HPCA_2024_final folder.
We use the following pattern for naming the output files.
{exp_type}_{model_name/data_size}_{allreduce_alg_type}_{total_nodes}_mesh.json
So, to test bw utilization of TTO with 512 MB data and 16 nodes, the output file name will be bw_134217728_mesh_overlap_2d_1_16_mesh.json
Here is the number of experiments in each file.
- bw/ring_bi_odd_bw.sh: 22
- bw/tree_dtree_bw.sh: 44
- bw/ring_uni_even_bw.sh: 22
- bw/ring_uni_odd_bw.sh: 22
- bw/mesh_overlap_2d_1_bw.sh: 44
- bw/ring_bi_even_bw.sh: 22
- bw/ring_2dn_bw.sh: 44
- bw/multitree_bw.sh: 44
- models/ring_uni_even_models.sh: 7
- models/tree_dtree_models.sh: 14
- models/ring_2dn_models.sh: 14
- models/ring_bi_odd_models.sh: 7
- models/mesh_overlap_2d_1_models.sh: 14
- models/multitree_models.sh: 14
- models/ring_uni_odd_models.sh: 7
- models/ring_bi_even_models.sh: 7
- overlap/comm_comp_overlap.sh: Around 1002
- overlap/training_overlap.sh: 7
- scalability/ring_bi_odd_scalability.sh: 7
- scalability/ring_uni_odd_scalability.sh: 7
- scalability/ring_bi_even_scalability.sh: 7
- scalability/mesh_overlap_2d_1_scalability.sh: 14
- scalability/multitree_scalability.sh: 14
- scalability/ring_2dn_scalability.sh: 116
- scalability/ring_uni_even_scalability.sh: 7
- simba/simba_exp.sh: 84
- simba/simba_train.sh: 14
- chunk_sensitivity.sh: 11
- link_utilization.sh: 5