CC omp_hello.cpp -Ofast -fopenmp -o omp_hello
srun -p cpu --ntasks=1 --time=00:05:00 --mem=20G --reservation=<redacted> --cpus-per-task=4 ./omp_hello
Results(matmat.cpp) without pragma, ofast, fopenmp: took 109 milliseconds 0.307838 GFLOPS/s
Results(matmat.cpp) with ofast, fopenmp: 4.19858e+06 took 20 milliseconds 1.67772 GFLOPS/s
Results(matmat.cpp) with ofast, fopenmp, and pragma: 4.19858e+06 took 7 milliseconds 4.79349 GFLOPS/s
Results(comp_pi.cpp) (comp_pi.exe is without pragma, comp_pi2.exe is with pragma):
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi2.exe
Pi is approximately: 3.13863
took 215 milliseconds
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi2.exe 10000000
Pi is approximately: 3.14094
took 432 milliseconds
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi2.exe 100000000
Pi is approximately: 3.14139
took 1050 milliseconds
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi2.exe 1000000000
Pi is approximately: 3.14157
took 6429 milliseconds
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi2.exe 10000000000
Pi is approximately: 3.14154
took 8552 milliseconds
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi.exe
Pi is approximately: 3.14275
took 32 milliseconds
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi.exe 10000000
Pi is approximately: 3.14213
took 326 milliseconds
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi.exe 100000000
Pi is approximately: 3.14186
took 2692 milliseconds
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi.exe 1000000000
Pi is approximately: 3.14159
took 19478 milliseconds
[ parallel-prog-2025-lab-2-DobDan42]$ ./comp_pi.exe 10000000000
Pi is approximately: 3.14161
took 28570 milliseconds
With higher number of parameter N, there is a visible difference.
Task: parallelize cfd_euler.cpp.
Default runtime values without parallelization (cfd_euler_unaltered.exe and corresponding .cpp):
3747 ms, 3405 ms, 3465 ms, 3422 ms, 4216 ms
Runtime values with parallelization(cfd_parallel.exe and cfd_euler_parallel.cpp):
3369 ms, 3072 ms, 2810 ms, 2790 ms, 2815 ms
Runtime values with parallelization(cfd_parallel_v2.exe and cfd_euler_parallel_v2.cpp):
Solution in progress, to see whether parallelization is possible on the main loop.
This is either not possible, or it would need the threads to run in order, and execute after the preceding thread writes the new values back to the variables rho, rhou, rhov, and E. After this, the parallel part would be the calculation of kinetic energy, which would probably be a small improvement in runtime.
See code and comments at lines 319-331.
This code is left in a faulty state to keep the work documented.
Results:
cfd_euler_parallel.cpp holds the parallelized solution (cfd_parallel.exe)
cfd_euler_parallel_v1_2.cpp (cfd_parallel_v1_2.exe) is an extended version with parallelization at the inizialitation step, but that is not measured in the timing.
Note that when working from home, you should not use the reservation. E.g. to compile and run the assignment, you can use the following command:
CC cfd_euler.cpp -Ofast -fopenmp -o cfd_euler
srun -p cpu --ntasks=1 --time=00:05:00 --mem=20G --cpus-per-task=4 ./cfd_euler