Replies: 4 comments 12 replies
-
See the User's Guide for a discussion of the strong and weak scaling tests. Input files here: https://github.com/firemodels/fds/tree/master/Validation/MPI_Scaling_Tests/FDS_Input_Files |
Beta Was this translation helpful? Give feedback.
-
The weak scaling tests take about a minute of CPU. The strong scaling tests take between 10 s and 30 min. |
Beta Was this translation helpful? Give feedback.
-
Dear all, I just finished running several simulations for the strong scaling test, for several VMs of the c4 available family on Google Cloud. I want to share this first set of tests with you because I am not sure exactly how to interpret the results. Experimental Setting
For each machine, run
for F in (001, 008, 032, 064, 096). This means the MPI processes is always 1, but OpenMP processes will be 2, 4, 8, etc.. with the number of vCPUs of the machine. I am assuming that the problem can only be partitioned up to "F" partitions, so we should see no speed up after we get to number of OpenMP threads larger than F. (I am not sure if that is the case, though). Also, not sure if the amount of work goes up "linearly" with the test case (i.e. strong_scaling_test_008.fds has 8 x more work than strong_scaling_test_001.fds) Overall, 8 x 5 = 40 runs. The code for running all the tests is show below, as well as all the results. Initial ResultsHere I am just looking separately at:
File strong_scaling_test_032.
The speed ups seem reasonable, as you throw more vCPU to the same problem up to the number of "viable partitions" (my interpretation). But I don't really understand why the times go up if FDS is not partitioning further than 32 parts if partitioning does not go beyond 32 parts (i.e. machines with more capacity would just be not fully utilised but there should be no extra overhead) VM c4-standard-32
I am not sure if I understand these result, unless the amount of work goes up "linearly" with the test case (i.e. strong_scaling_test_008.fds has 8 x more work thant strong_scaling_test_001.fds). If it does, then the numbers show that FDS is effectively using threads really effectively. t_096 < 96 x t_001 / 32 Initial commentsOverall, I am not sure exactly how to interpret these results. Many thanks!! Appdedix A: CodeFor loop over machine types x input files using our own Inductiva api.
Appendix B: Dump of all results
Note: Failures are related with lack of RAM (these machines have 4GB per vCPU). Appendix C: Example Log FIle
|
Beta Was this translation helpful? Give feedback.
-
I think that these tests are missing the point. The MPI scaling tests are intended to test the MPI functionality, not OpenMP. To test the OpenMP, it is best to use a single mesh input file because the OpenMP threads will work to speed up the 3-D do-loops. When you apply OpenMP processes to cases with different numbers and dimensions of meshes, you are now confusing the speed-up offered by OpenMP with the slowdown caused by looping over loops of diminishing size. I suggest you use the MPI scaling tests for MPI, and the OMP scaling tests for OpenMP. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear FDS community:
I am new to FDS and my interest is mostly on the computational side of the model.
I am looking for FDS input files, preferably publicly available, to benchmark the speedups that one can obtain by running FDS on various cloud machines of several generations, AMD vs Intel, with up to 360 vCPUs. I have been unable to find input examples that can benefit from throwing more CPUs at the problem, since parallelisation depends on mesh division, and it seems that there are not that many examples with high mesh partitioning publicly available.
Would anyone please point me to one of such examples?
The results of the benchmark are supposed to be public, so I am especially interested in having access to publicly available input files so that other people can also replicate the results!
I apollogize if this sounds like a very stupid request but, again, my interest is mostly computational / infrastructural
Thank you so much in advance!
Best,
L
Beta Was this translation helpful? Give feedback.
All reactions