Fixing bug with FYPP macros #931

prathi-wind · 2025-07-08T22:54:44Z

User description

Description

There was a bug with how I had replaced acc kernels with acc parallel. This pull request should fix that.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Something else

Scope

This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

Test A
Test B

Test Configuration:

What computers and compilers did you use to test this:

Checklist

I have added comments for the new code
I added Doxygen docstrings to the new code
I have made corresponding changes to the documentation (docs/)
I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
I have added example cases in examples/ that demonstrate my new feature performing as expected.
They run to completion and demonstrate "interesting physics"
I ran ./mfc.sh format before committing my code
New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
This PR does not introduce any repeated code (it follows the DRY principle)
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

Checked that the code compiles using NVHPC compilers
Checked that the code compiles using CRAY compilers
Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
Enclosed the new feature via nvtx ranges so that they can be identified in profiles
Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Bug fix

Description

Replace incorrect acc kernels with proper GPU_PARALLEL macros
Fix GPU parallelization directives in time stepping and data output
Add GPU parallelization documentation reference

Changes diagram

flowchart LR
  A["acc kernels directives"] -- "replace with" --> B["GPU_PARALLEL macros"]
  B --> C["proper copyout/copyin parameters"]
  D["documentation"] -- "add" --> E["GPU parallelization reference"]

Changes walkthrough 📝

Relevant files

Bug fix

m_data_output.fpp `Fix GPU parallelization in data output` src/simulation/m_data_output.fpp Replace `acc kernels` with `GPU_PARALLEL` macro calls Add proper copyout and copyin parameters for GPU data transfer Fix parallelization for CFL and viscous calculations	+7/-7
m_time_steppers.fpp `Fix GPU parallelization in time stepping` src/simulation/m_time_steppers.fpp Replace `acc kernels` with `GPU_PARALLEL` macro for dt calculation Add copyout and copyin parameters for time step computation	+3/-3

Documentation

readme.md `Add GPU parallelization documentation link` docs/documentation/readme.md Add reference to GPU parallelization documentation	+1/-0

Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
Check out the documentation for more information.

codecov · 2025-07-09T01:00:22Z

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 43.68%. Comparing base (adcc0dd) to head (2afef60).
Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_data_output.fpp	0.00%	1 Missing ⚠️
src/simulation/m_time_steppers.fpp	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #931      +/-   ##
==========================================
- Coverage   43.71%   43.68%   -0.03%     
==========================================
  Files          68       68              
  Lines       18360    18363       +3     
  Branches     2292     2295       +3     
==========================================
- Hits         8026     8022       -4     
- Misses       8945     8949       +4     
- Partials     1389     1392       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

qodo-merge-pro · 2025-07-12T14:11:54Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The GPU_PARALLEL macro calls are placed inside conditional blocks that may not execute on GPU. The original acc kernels directives were outside the viscous condition, but the new GPU_PARALLEL calls are inside the conditional blocks, potentially changing execution behavior. #:call GPU_PARALLEL(copyout='[icfl_max_loc]', copyin='[icfl_sf]') icfl_max_loc = maxval(icfl_sf) #:endcall GPU_PARALLEL if (viscous) then #:call GPU_PARALLEL(copyout='[vcfl_max_loc, Rc_min_loc]', copyin='[vcfl_sf,Rc_sf]') vcfl_max_loc = maxval(vcfl_sf) Rc_min_loc = minval(Rc_sf) #:endcall GPU_PARALLEL end if Missing File A reference to 'gpuParallelization.md' is added to the documentation index, but the actual file may not exist in the repository, which could result in broken documentation links. - [GPU Parallelization](gpuParallelization.md) - [MFC's Authors](authors.md)

qodo-merge-pro · 2025-07-12T14:12:25Z

PR Code Suggestions ✨

No code suggestions found for the PR.

sbryngelson · 2025-07-14T15:59:48Z

src/simulation/m_data_output.fpp

-        !$acc kernels
-        icfl_max_loc = maxval(icfl_sf)
-        !$acc end kernels
+        #:call GPU_PARALLEL(copyout='[icfl_max_loc]', copyin='[icfl_sf]')


are you sure this works? why don't you just specify acc kernels via an option to GPU_PARALLEL? right now it looks like you're doing things differently than before when it would be easy to make them the same, but i'm quite sure why. i guess if this parallel loop works it is nicer than invoking kernels? idk

acc kernels requires the compiler to parallelize the surrounded code for you. The compiler will also not parallelize if it cannot guarantee data-dependency free loops. There is no OpenMP support for the compiler to analyze code and parallelize the code on behalf of the developer. If there should be maximum performance with OpenMP, then the codebase can't use GPU_KERNELS or its equivalent since that won't have GPU acceleration in OpenMP.

what i'm saying is you would use kernels if the compiler is nvhpc and the offload engine is openacc. otherwise, it goes to OMP and uses whatever is appropriate for that compiler. this would all be taken care of in the macro. of course if you can find a acc parallel shortcut that also works for nvhpc + openacc that's fine with me too.

Fix kernels bug hopefully

49ee6b0

prathi-wind linked an issue Jul 8, 2025 that may be closed by this pull request

Metadirectives kernels fixup #927

Closed

sbryngelson approved these changes Jul 9, 2025

View reviewed changes

Merge branch 'master' into macro-fix

2afef60

sbryngelson marked this pull request as ready for review July 12, 2025 14:11

sbryngelson requested review from a team as code owners July 12, 2025 14:11

qodo-merge-pro bot added the Review effort 2/5 label Jul 12, 2025

sbryngelson approved these changes Jul 14, 2025

View reviewed changes

sbryngelson reviewed Jul 14, 2025

View reviewed changes

sbryngelson merged commit 55b50a5 into MFlowCode:master Jul 15, 2025
81 of 87 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixing bug with FYPP macros #931

Fixing bug with FYPP macros #931

prathi-wind commented Jul 8, 2025 •

edited by qodo-merge-pro bot

Loading

Uh oh!

codecov bot commented Jul 9, 2025 •

edited

Loading

Uh oh!

qodo-merge-pro bot commented Jul 12, 2025

Uh oh!

qodo-merge-pro bot commented Jul 12, 2025

Uh oh!

sbryngelson Jul 14, 2025

Uh oh!

prathi-wind Jul 14, 2025

Uh oh!

sbryngelson Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

Fixing bug with FYPP macros #931

Fixing bug with FYPP macros #931

Conversation

prathi-wind commented Jul 8, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

PR Type

Description

Changes diagram

Changes walkthrough 📝

Uh oh!

codecov bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

qodo-merge-pro bot commented Jul 12, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Jul 12, 2025

PR Code Suggestions ✨

Uh oh!

sbryngelson Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

prathi-wind Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

sbryngelson Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

prathi-wind commented Jul 8, 2025 •

edited by qodo-merge-pro bot

Loading

If your code changes any code source files (anything in `src/simulation`)

codecov bot commented Jul 9, 2025 •

edited

Loading