You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1.`profiling`: Collects the time consumption of each stage during the model training process in advance and saves it as data files and image files.
7
+
2.`simulation`: Simulates the model training process based on the collected data files and outputs the time consumption of each stage during the training process.
1. It is recommended to use more than 64 GPUs for data collection to ensure more accurate communication data.
20
+
2.`Flash Attention` information is not collected in advance but is collected on the fly during the simulation and stored in the cache. This is because there are many variables that affect the performance of flash attention, and collecting in advance cannot cover all variables.
21
+
22
+
```python
23
+
# generate profiling data
24
+
torchrun --nproc-per-node=8 gen_profiler_data.py
25
+
26
+
# the profiling data will be saved in the following path
27
+
./prof_data
28
+
├── data.pt
29
+
└── pics
30
+
├── cal
31
+
│ └── linear.jpg
32
+
└── comm
33
+
├── all2all_intra_2_inter_1.jpg
34
+
├── all2all_intra_4_inter_1.jpg
35
+
├── all_gather_intra_2_inter_1.jpg
36
+
├── all_gather_intra_4_inter_1.jpg
37
+
├── all_reduce_intra_2_inter_1.jpg
38
+
├── all_reduce_intra_4_inter_1.jpg
39
+
├── broadcast_intra_2_inter_1.jpg
40
+
├── broadcast_intra_4_inter_1.jpg
41
+
├── reduce_scatter_intra_2_inter_1.jpg
42
+
└── reduce_scatter_intra_4_inter_1.jpg
43
+
44
+
```
45
+
46
+
### 2.2 Run simulation
47
+
Running the solver does not require a GPU (although some packages may require a GPU environment, if you encounter any issues, please raise an issue). Currently, the solver only supports the formulaic solving method using simulation_train_formulaic.py, which requires a config file and profiling data file as follows:
0 commit comments