Skip to content

Fixing bug with FYPP macros #931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/documentation/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- [Running MFC](running.md)
- [Flow Visualization](visualization.md)
- [Performance](expectedPerformance.md)
- [GPU Parallelization](gpuParallelization.md)
- [MFC's Authors](authors.md)
- [References](references.md)

Expand Down
14 changes: 7 additions & 7 deletions src/simulation/m_data_output.fpp
Original file line number Diff line number Diff line change
Expand Up @@ -316,14 +316,14 @@
Rc_min_loc = minval(Rc_sf)
end if
#else
!$acc kernels
icfl_max_loc = maxval(icfl_sf)
!$acc end kernels
#:call GPU_PARALLEL(copyout='[icfl_max_loc]', copyin='[icfl_sf]')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure this works? why don't you just specify acc kernels via an option to GPU_PARALLEL? right now it looks like you're doing things differently than before when it would be easy to make them the same, but i'm quite sure why. i guess if this parallel loop works it is nicer than invoking kernels? idk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acc kernels requires the compiler to parallelize the surrounded code for you. The compiler will also not parallelize if it cannot guarantee data-dependency free loops. There is no OpenMP support for the compiler to analyze code and parallelize the code on behalf of the developer. If there should be maximum performance with OpenMP, then the codebase can't use GPU_KERNELS or its equivalent since that won't have GPU acceleration in OpenMP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what i'm saying is you would use kernels if the compiler is nvhpc and the offload engine is openacc. otherwise, it goes to OMP and uses whatever is appropriate for that compiler. this would all be taken care of in the macro. of course if you can find a acc parallel shortcut that also works for nvhpc + openacc that's fine with me too.

icfl_max_loc = maxval(icfl_sf)
#:endcall GPU_PARALLEL
if (viscous) then
!$acc kernels
vcfl_max_loc = maxval(vcfl_sf)
Rc_min_loc = minval(Rc_sf)
!$acc end kernels
#:call GPU_PARALLEL(copyout='[vcfl_max_loc, Rc_min_loc]', copyin='[vcfl_sf,Rc_sf]')

Check warning on line 323 in src/simulation/m_data_output.fpp

View check run for this annotation

Codecov / codecov/patch

src/simulation/m_data_output.fpp#L323

Added line #L323 was not covered by tests
vcfl_max_loc = maxval(vcfl_sf)
Rc_min_loc = minval(Rc_sf)
#:endcall GPU_PARALLEL
end if
#endif

Expand Down
6 changes: 3 additions & 3 deletions src/simulation/m_time_steppers.fpp
Original file line number Diff line number Diff line change
Expand Up @@ -993,9 +993,9 @@
end do
end do

!$acc kernels
dt_local = minval(max_dt)
!$acc end kernels
#:call GPU_PARALLEL(copyout='[dt_local]', copyin='[max_dt]')

Check warning on line 996 in src/simulation/m_time_steppers.fpp

View check run for this annotation

Codecov / codecov/patch

src/simulation/m_time_steppers.fpp#L996

Added line #L996 was not covered by tests
dt_local = minval(max_dt)
#:endcall GPU_PARALLEL

if (num_procs == 1) then
dt = dt_local
Expand Down
Loading