-
Notifications
You must be signed in to change notification settings - Fork 109
Fixing bug with FYPP macros #931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #931 +/- ##
==========================================
- Coverage 43.71% 43.68% -0.03%
==========================================
Files 68 68
Lines 18360 18363 +3
Branches 2292 2295 +3
==========================================
- Hits 8026 8022 -4
- Misses 8945 8949 +4
- Partials 1389 1392 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
!$acc kernels | ||
icfl_max_loc = maxval(icfl_sf) | ||
!$acc end kernels | ||
#:call GPU_PARALLEL(copyout='[icfl_max_loc]', copyin='[icfl_sf]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure this works? why don't you just specify acc kernels
via an option to GPU_PARALLEL? right now it looks like you're doing things differently than before when it would be easy to make them the same, but i'm quite sure why. i guess if this parallel loop works it is nicer than invoking kernels? idk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acc kernels
requires the compiler to parallelize the surrounded code for you. The compiler will also not parallelize if it cannot guarantee data-dependency free loops. There is no OpenMP support for the compiler to analyze code and parallelize the code on behalf of the developer. If there should be maximum performance with OpenMP, then the codebase can't use GPU_KERNELS
or its equivalent since that won't have GPU acceleration in OpenMP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what i'm saying is you would use kernels if the compiler is nvhpc and the offload engine is openacc. otherwise, it goes to OMP and uses whatever is appropriate for that compiler. this would all be taken care of in the macro. of course if you can find a acc parallel
shortcut that also works for nvhpc + openacc that's fine with me too.
User description
Description
There was a bug with how I had replaced
acc kernels
withacc parallel
. This pull request should fix that.Fixes #(issue) [optional]
Type of change
Please delete options that are not relevant.
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration
Test Configuration:
Checklist
docs/
)examples/
that demonstrate my new feature performing as expected.They run to completion and demonstrate "interesting physics"
./mfc.sh format
before committing my codeIf your code changes any code source files (anything in
src/simulation
)To make sure the code is performing as expected on GPU devices, I have:
nvtx
ranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys
, and have attached the output file (.nsys-rep
) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace
, and have attached the output file and plain text results to this PR.PR Type
Bug fix
Description
Replace incorrect
acc kernels
with properGPU_PARALLEL
macrosFix GPU parallelization directives in time stepping and data output
Add GPU parallelization documentation reference
Changes diagram
Changes walkthrough 📝
m_data_output.fpp
Fix GPU parallelization in data output
src/simulation/m_data_output.fpp
acc kernels
withGPU_PARALLEL
macro callsm_time_steppers.fpp
Fix GPU parallelization in time stepping
src/simulation/m_time_steppers.fpp
acc kernels
withGPU_PARALLEL
macro for dt calculationreadme.md
Add GPU parallelization documentation link
docs/documentation/readme.md