Skip to content

Optimizing StokesFOResid for LandIce 3D #1048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 31, 2024

Conversation

OscarAntepara
Copy link
Contributor

Optimizing StokesFOResid for LandIce 3D by cleaning the code, removing if statements inside the kernel, doing local accumulation and using compile time variables for the loops.
For the ant-16km test, the original code timings are:

Phalanx: Evaluator 86: [] StokesFOResid: 0.769688 - 67.6716% [8] {min=0.758983, max=0.798954, std dev=0.0195347}
Phalanx: Evaluator 15: [] StokesFOResid: 0.0437266 - 22.2496% [13] {min=0.0424196, max=0.0470457, std dev=0.00221728}

New code timings are:

Phalanx: Evaluator 86: [] StokesFOResid: 0.284896 - 43.2264% [8] {min=0.283264, max=0.287391, std dev=0.00182733}
Phalanx: Evaluator 15: [] StokesFOResid: 0.024342 - 14.3191% [13] {min=0.0241703, max=0.0244444, std dev=0.000118884}

Copy link
Collaborator

@jewatkins jewatkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I'm assuming the numbers are on pmgpu? Do you have the numbers on frontier?

@OscarAntepara
Copy link
Contributor Author

Yes, this is on pmgpu. I haven't tried frontier.

@mperego
Copy link
Collaborator

mperego commented May 14, 2024

Thanks @OscarAntepara !
So the issue was that writing to MDFields (e.g., Residual(cell,node,1)) is expensive on GPUs?
Is it possible to refactor the code to avoid the duplication of this code?
https://github.com/sandialabs/Albany/blob/master/src/landIce/evaluators/LandIce_StokesFOResid_Def.hpp#L193-L209

@jewatkins
Copy link
Collaborator

Thanks @OscarAntepara ! So the issue was that writing to MDFields (e.g., Residual(cell,node,1)) is expensive on GPUs?

Right, it's better to write locally in the internal loop than write to device memory outside the loop. That might not be the case on CPU... Oscar could you check cpu performance?

Is it possible to refactor the code to avoid the duplication of this code? https://github.com/sandialabs/Albany/blob/master/src/landIce/evaluators/LandIce_StokesFOResid_Def.hpp#L193-L209

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient).

@OscarAntepara
Copy link
Contributor Author

Yeah, doing the accumulation locally improves data locality in the GPU which gives a better performance. I will check the cpu performance.

void StokesFOResid<EvalT, Traits>::
operator() (const LandIce_3D_Opt_Tag<NumNodes>& tag, const int& cell) const{
static constexpr int num_nodes = tag.num_nodes;
ScalarT res0[num_nodes];
Copy link
Collaborator

@bartgol bartgol May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the code compile if you do the following?

ScalarT res0[num_nodes] = {0};

It should do value-initialization, and init all entries with 0's. Maybe it could save you the loop right below. However, I'm not sure if it works correctly with FadTypes...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be okay, we do it in another evaluator and it hasn't caused issues with FadTypes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or default initialization, which probably is the same as initializing it to zero.
ScalarT res0[num_nodes] = {};
Yes, FAD types have default initialization.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would work too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Both options work! So I'll push ScalarT res0[num_nodes] = {};

Copy link
Collaborator

@mcarlson801 mcarlson801 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks Oscar!

@mperego
Copy link
Collaborator

mperego commented May 14, 2024

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

@jewatkins
Copy link
Collaborator

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

@OscarAntepara
Copy link
Contributor Author

OscarAntepara commented May 16, 2024

Thanks @OscarAntepara ! So the issue was that writing to MDFields (e.g., Residual(cell,node,1)) is expensive on GPUs?

Right, it's better to write locally in the internal loop than write to device memory outside the loop. That might not be the case on CPU... Oscar could you check cpu performance?

Is it possible to refactor the code to avoid the duplication of this code? https://github.com/sandialabs/Albany/blob/master/src/landIce/evaluators/LandIce_StokesFOResid_Def.hpp#L193-L209

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient).

For the 16km test with CPUs there is not much difference between the original and the new code.
Original:
Phalanx: Evaluator 15: [Residual] StokesFOResid: 0.368748 - 32.183% [13] {min=0.358496, max=0.387898, std dev=0.00415609}
Phalanx: Evaluator 86: [Jacobian] StokesFOResid: 1.13733 - 27.0871% [8] {min=1.11755, max=1.16131, std dev=0.0112813}

New:
Phalanx: Evaluator 15: [Residual] StokesFOResid: 0.368064 - 32.1713% [13] {min=0.359196, max=0.383944, std dev=0.00349569}
Phalanx: Evaluator 86: [Jacobian] StokesFOResid: 1.11119 - 26.5841% [8] {min=1.076, max=1.14491, std dev=0.0124654}

…o avoid code duplication for the operator not using StereographicMap
@OscarAntepara
Copy link
Contributor Author

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

I have modified the code to avoid the code duplication mentioned before.

@mperego
Copy link
Collaborator

mperego commented May 16, 2024

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

I have modified the code to avoid the code duplication mentioned before.

Thanks! I think it's a reasonable approach. Have you tested it already? I'm not sure if we still have some tests using Tets, in which case you would have to add a specialization for numNodes==4.

@OscarAntepara
Copy link
Contributor Author

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

I have modified the code to avoid the code duplication mentioned before.

Thanks! I think it's a reasonable approach. Have you tested it already? I'm not sure if we still have some tests using Tets, in which case you would have to add a specialization for numNodes==4.

I just have tested it for the ant-16km that is just with numNodes=8. Is there a test for LandIce3D with numNodes=4? is that using just tetrahedrons?

@mperego
Copy link
Collaborator

mperego commented May 16, 2024

We can probably do that with an inline device function but it might get ugly. We thought by keeping the old implementation it might improve readability (similar to what we do for the optimized gradient)

OK, it's just that if we need to modify that computation kernel we have to do it in multiple places.

Right, probably okay to turn into an inline function then. I don't think it will be too complicated in terms of readability.

I have modified the code to avoid the code duplication mentioned before.

Thanks! I think it's a reasonable approach. Have you tested it already? I'm not sure if we still have some tests using Tets, in which case you would have to add a specialization for numNodes==4.

I just have tested it for the ant-16km that is just with numNodes=8. Is there a test for LandIce3D with numNodes=4? is that using just tetrahedrons?

We used to have a lot, but e converted most of them to Wedges. I couldn't find one with a quick search. Can you simply run Albany landice ctests?

@OscarAntepara
Copy link
Contributor Author

Albany landice ctests

Yeah, I got this:

Test project /pscratch/sd/o/oantepar/fanssie/builds/albany_pm_gpu_nouvm_gnu_sfad16
Start 1: unit_NullSpaceUtils_UnitTest_Serial
1/20 Test #1: unit_NullSpaceUtils_UnitTest_Serial ............................ Passed 9.43 sec
Start 2: unit_NullSpaceUtils_UnitTest_Parallel
2/20 Test #2: unit_NullSpaceUtils_UnitTest_Parallel .......................... Passed 14.38 sec
Start 3: unit_StringUtils_UnitTest
3/20 Test #3: unit_StringUtils_UnitTest ...................................... Passed 88.81 sec
Start 4: unit_HessianVecFad_UnitTest
4/20 Test #4: unit_HessianVecFad_UnitTest .................................... Passed 24.46 sec
Start 5: disc_stk_STKDisc_UnitTest_Serial
5/20 Test #5: disc_stk_STKDisc_UnitTest_Serial ............................... Passed 60.67 sec
Start 6: disc_stk_STKDisc_UnitTest_Parallel
6/20 Test #6: disc_stk_STKDisc_UnitTest_Parallel ............................. Passed 80.63 sec
Start 7: unit_evaluators_DOFInterpolation_UnitTest_Serial
7/20 Test #7: unit_evaluators_DOFInterpolation_UnitTest_Serial ............... Passed 35.56 sec
Start 8: unit_evaluators_DOFInterpolation_UnitTest_Parallel
8/20 Test #8: unit_evaluators_DOFInterpolation_UnitTest_Parallel ............. Passed 29.89 sec
Start 9: unit_evaluators_GatherSolution_UnitTest_Serial
9/20 Test #9: unit_evaluators_GatherSolution_UnitTest_Serial ................. Passed 29.02 sec
Start 10: unit_evaluators_GatherSolution_UnitTest_Parallel
10/20 Test #10: unit_evaluators_GatherSolution_UnitTest_Parallel ............... Passed 31.00 sec
Start 11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial
11/20 Test #11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial ..... Passed 31.92 sec
Start 12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel
12/20 Test #12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel ... Passed 10.94 sec
Start 13: unit_evaluators_ScatterResidual_UnitTest_Serial
13/20 Test #13: unit_evaluators_ScatterResidual_UnitTest_Serial ................***Failed 82.82 sec
Start 14: unit_evaluators_ScatterResidual_UnitTest_Parallel
14/20 Test #14: unit_evaluators_ScatterResidual_UnitTest_Parallel ..............***Failed 47.41 sec
Start 15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial
15/20 Test #15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial ..........***Failed 13.53 sec
Start 16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel
16/20 Test #16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel ........***Failed 152.74 sec
Start 17: LandIce_FO_Dome_Ascii
17/20 Test #17: LandIce_FO_Dome_Ascii .......................................... Passed 45.23 sec
Start 18: LandIce_FO_Dome_Restart
18/20 Test #18: LandIce_FO_Dome_Restart ........................................ Passed 58.91 sec
Start 19: landIce_FO_AIS_16km_decompMesh
19/20 Test #19: landIce_FO_AIS_16km_decompMesh .................................***Failed 43.10 sec
Start 20: landIce_FO_AIS_16km_MueLuKokkos
Failed test dependencies: landIce_FO_AIS_16km_decompMesh
20/20 Test #20: landIce_FO_AIS_16km_MueLuKokkos ................................***Not Run 0.00 sec

70% tests passed, 6 tests failed out of 20

Label Time Summary:
Forward = 104.14 secproc (3 tests)
LandIce = 104.14 sec
proc (3 tests)
Serial = 104.14 secproc (2 tests)
unit = 743.20 sec
proc (16 tests)

Total Test time (real) = 890.49 sec

The following tests FAILED:
13 - unit_evaluators_ScatterResidual_UnitTest_Serial (Failed)
14 - unit_evaluators_ScatterResidual_UnitTest_Parallel (Failed)
15 - unit_evaluators_ScatterScalarResponse_UnitTest_Serial (Failed)
16 - unit_evaluators_ScatterScalarResponse_UnitTest_Parallel (Failed)
19 - landIce_FO_AIS_16km_decompMesh (Failed)
20 - landIce_FO_AIS_16km_MueLuKokkos (Not Run)
Errors while running CTest

@mperego
Copy link
Collaborator

mperego commented May 16, 2024

Albany landice ctests

Yeah, I got this:

Test project /pscratch/sd/o/oantepar/fanssie/builds/albany_pm_gpu_nouvm_gnu_sfad16
Start 1: unit_NullSpaceUtils_UnitTest_Serial
1/20 Test #1: unit_NullSpaceUtils_UnitTest_Serial ............................ Passed 9.43 sec
Start 2: unit_NullSpaceUtils_UnitTest_Parallel
2/20 Test #2: unit_NullSpaceUtils_UnitTest_Parallel .......................... Passed 14.38 sec
Start 3: unit_StringUtils_UnitTest
3/20 Test #3: unit_StringUtils_UnitTest ...................................... Passed 88.81 sec
Start 4: unit_HessianVecFad_UnitTest
4/20 Test #4: unit_HessianVecFad_UnitTest .................................... Passed 24.46 sec
Start 5: disc_stk_STKDisc_UnitTest_Serial
5/20 Test #5: disc_stk_STKDisc_UnitTest_Serial ............................... Passed 60.67 sec
Start 6: disc_stk_STKDisc_UnitTest_Parallel
6/20 Test #6: disc_stk_STKDisc_UnitTest_Parallel ............................. Passed 80.63 sec
Start 7: unit_evaluators_DOFInterpolation_UnitTest_Serial
7/20 Test #7: unit_evaluators_DOFInterpolation_UnitTest_Serial ............... Passed 35.56 sec
Start 8: unit_evaluators_DOFInterpolation_UnitTest_Parallel
8/20 Test #8: unit_evaluators_DOFInterpolation_UnitTest_Parallel ............. Passed 29.89 sec
Start 9: unit_evaluators_GatherSolution_UnitTest_Serial
9/20 Test #9: unit_evaluators_GatherSolution_UnitTest_Serial ................. Passed 29.02 sec
Start 10: unit_evaluators_GatherSolution_UnitTest_Parallel
10/20 Test #10: unit_evaluators_GatherSolution_UnitTest_Parallel ............... Passed 31.00 sec
Start 11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial
11/20 Test #11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial ..... Passed 31.92 sec
Start 12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel
12/20 Test #12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel ... Passed 10.94 sec
Start 13: unit_evaluators_ScatterResidual_UnitTest_Serial
13/20 Test #13: unit_evaluators_ScatterResidual_UnitTest_Serial ................***Failed 82.82 sec
Start 14: unit_evaluators_ScatterResidual_UnitTest_Parallel
14/20 Test #14: unit_evaluators_ScatterResidual_UnitTest_Parallel ..............***Failed 47.41 sec
Start 15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial
15/20 Test #15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial ..........***Failed 13.53 sec
Start 16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel
16/20 Test #16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel ........***Failed 152.74 sec
Start 17: LandIce_FO_Dome_Ascii
17/20 Test #17: LandIce_FO_Dome_Ascii .......................................... Passed 45.23 sec
Start 18: LandIce_FO_Dome_Restart
18/20 Test #18: LandIce_FO_Dome_Restart ........................................ Passed 58.91 sec
Start 19: landIce_FO_AIS_16km_decompMesh
19/20 Test #19: landIce_FO_AIS_16km_decompMesh .................................***Failed 43.10 sec
Start 20: landIce_FO_AIS_16km_MueLuKokkos
Failed test dependencies: landIce_FO_AIS_16km_decompMesh
20/20 Test #20: landIce_FO_AIS_16km_MueLuKokkos ................................***Not Run 0.00 sec

70% tests passed, 6 tests failed out of 20

Label Time Summary: Forward = 104.14 sec_proc (3 tests) LandIce = 104.14 sec_proc (3 tests) Serial = 104.14 sec_proc (2 tests) unit = 743.20 sec_proc (16 tests)

Total Test time (real) = 890.49 sec

The following tests FAILED: 13 - unit_evaluators_ScatterResidual_UnitTest_Serial (Failed) 14 - unit_evaluators_ScatterResidual_UnitTest_Parallel (Failed) 15 - unit_evaluators_ScatterScalarResponse_UnitTest_Serial (Failed) 16 - unit_evaluators_ScatterScalarResponse_UnitTest_Parallel (Failed) 19 - landIce_FO_AIS_16km_decompMesh (Failed) 20 - landIce_FO_AIS_16km_MueLuKokkos (Not Run) Errors while running CTest

for 19 and 20, you probably need to put the trilinos libs in your LD_LIBRARY_PATH.

@OscarAntepara
Copy link
Contributor Author

Albany landice ctests

Yeah, I got this:

Test project /pscratch/sd/o/oantepar/fanssie/builds/albany_pm_gpu_nouvm_gnu_sfad16
Start 1: unit_NullSpaceUtils_UnitTest_Serial
1/20 Test #1: unit_NullSpaceUtils_UnitTest_Serial ............................ Passed 9.43 sec
Start 2: unit_NullSpaceUtils_UnitTest_Parallel
2/20 Test #2: unit_NullSpaceUtils_UnitTest_Parallel .......................... Passed 14.38 sec
Start 3: unit_StringUtils_UnitTest
3/20 Test #3: unit_StringUtils_UnitTest ...................................... Passed 88.81 sec
Start 4: unit_HessianVecFad_UnitTest
4/20 Test #4: unit_HessianVecFad_UnitTest .................................... Passed 24.46 sec
Start 5: disc_stk_STKDisc_UnitTest_Serial
5/20 Test #5: disc_stk_STKDisc_UnitTest_Serial ............................... Passed 60.67 sec
Start 6: disc_stk_STKDisc_UnitTest_Parallel
6/20 Test #6: disc_stk_STKDisc_UnitTest_Parallel ............................. Passed 80.63 sec
Start 7: unit_evaluators_DOFInterpolation_UnitTest_Serial
7/20 Test #7: unit_evaluators_DOFInterpolation_UnitTest_Serial ............... Passed 35.56 sec
Start 8: unit_evaluators_DOFInterpolation_UnitTest_Parallel
8/20 Test #8: unit_evaluators_DOFInterpolation_UnitTest_Parallel ............. Passed 29.89 sec
Start 9: unit_evaluators_GatherSolution_UnitTest_Serial
9/20 Test #9: unit_evaluators_GatherSolution_UnitTest_Serial ................. Passed 29.02 sec
Start 10: unit_evaluators_GatherSolution_UnitTest_Parallel
10/20 Test #10: unit_evaluators_GatherSolution_UnitTest_Parallel ............... Passed 31.00 sec
Start 11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial
11/20 Test #11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial ..... Passed 31.92 sec
Start 12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel
12/20 Test #12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel ... Passed 10.94 sec
Start 13: unit_evaluators_ScatterResidual_UnitTest_Serial
13/20 Test #13: unit_evaluators_ScatterResidual_UnitTest_Serial ................***Failed 82.82 sec
Start 14: unit_evaluators_ScatterResidual_UnitTest_Parallel
14/20 Test #14: unit_evaluators_ScatterResidual_UnitTest_Parallel ..............***Failed 47.41 sec
Start 15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial
15/20 Test #15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial ..........***Failed 13.53 sec
Start 16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel
16/20 Test #16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel ........***Failed 152.74 sec
Start 17: LandIce_FO_Dome_Ascii
17/20 Test #17: LandIce_FO_Dome_Ascii .......................................... Passed 45.23 sec
Start 18: LandIce_FO_Dome_Restart
18/20 Test #18: LandIce_FO_Dome_Restart ........................................ Passed 58.91 sec
Start 19: landIce_FO_AIS_16km_decompMesh
19/20 Test #19: landIce_FO_AIS_16km_decompMesh .................................***Failed 43.10 sec
Start 20: landIce_FO_AIS_16km_MueLuKokkos
Failed test dependencies: landIce_FO_AIS_16km_decompMesh
20/20 Test #20: landIce_FO_AIS_16km_MueLuKokkos ................................***Not Run 0.00 sec

70% tests passed, 6 tests failed out of 20
Label Time Summary: Forward = 104.14 sec_proc (3 tests) LandIce = 104.14 sec_proc (3 tests) Serial = 104.14 sec_proc (2 tests) unit = 743.20 sec_proc (16 tests)
Total Test time (real) = 890.49 sec
The following tests FAILED: 13 - unit_evaluators_ScatterResidual_UnitTest_Serial (Failed) 14 - unit_evaluators_ScatterResidual_UnitTest_Parallel (Failed) 15 - unit_evaluators_ScatterScalarResponse_UnitTest_Serial (Failed) 16 - unit_evaluators_ScatterScalarResponse_UnitTest_Parallel (Failed) 19 - landIce_FO_AIS_16km_decompMesh (Failed) 20 - landIce_FO_AIS_16km_MueLuKokkos (Not Run) Errors while running CTest

for 19 and 20, you probably need to put the trilinos libs in your LD_LIBRARY_PATH.

Trueee, now I got this:
Test project /pscratch/sd/o/oantepar/fanssie/builds/albany_pm_gpu_nouvm_gnu_sfad16
Start 1: unit_NullSpaceUtils_UnitTest_Serial
1/20 Test #1: unit_NullSpaceUtils_UnitTest_Serial ............................ Passed 13.21 sec
Start 2: unit_NullSpaceUtils_UnitTest_Parallel
2/20 Test #2: unit_NullSpaceUtils_UnitTest_Parallel .......................... Passed 32.46 sec
Start 3: unit_StringUtils_UnitTest
3/20 Test #3: unit_StringUtils_UnitTest ...................................... Passed 55.65 sec
Start 4: unit_HessianVecFad_UnitTest
4/20 Test #4: unit_HessianVecFad_UnitTest .................................... Passed 39.80 sec
Start 5: disc_stk_STKDisc_UnitTest_Serial
5/20 Test #5: disc_stk_STKDisc_UnitTest_Serial ............................... Passed 11.90 sec
Start 6: disc_stk_STKDisc_UnitTest_Parallel
6/20 Test #6: disc_stk_STKDisc_UnitTest_Parallel ............................. Passed 5.44 sec
Start 7: unit_evaluators_DOFInterpolation_UnitTest_Serial
7/20 Test #7: unit_evaluators_DOFInterpolation_UnitTest_Serial ............... Passed 15.41 sec
Start 8: unit_evaluators_DOFInterpolation_UnitTest_Parallel
8/20 Test #8: unit_evaluators_DOFInterpolation_UnitTest_Parallel ............. Passed 16.10 sec
Start 9: unit_evaluators_GatherSolution_UnitTest_Serial
9/20 Test #9: unit_evaluators_GatherSolution_UnitTest_Serial ................. Passed 7.26 sec
Start 10: unit_evaluators_GatherSolution_UnitTest_Parallel
10/20 Test #10: unit_evaluators_GatherSolution_UnitTest_Parallel ............... Passed 62.92 sec
Start 11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial
11/20 Test #11: unit_evaluators_GatherDistributedParameter_UnitTest_Serial ..... Passed 60.16 sec
Start 12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel
12/20 Test #12: unit_evaluators_GatherDistributedParameter_UnitTest_Parallel ... Passed 41.62 sec
Start 13: unit_evaluators_ScatterResidual_UnitTest_Serial
13/20 Test #13: unit_evaluators_ScatterResidual_UnitTest_Serial ................***Failed 12.05 sec
Start 14: unit_evaluators_ScatterResidual_UnitTest_Parallel
14/20 Test #14: unit_evaluators_ScatterResidual_UnitTest_Parallel ..............***Failed 29.00 sec
Start 15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial
15/20 Test #15: unit_evaluators_ScatterScalarResponse_UnitTest_Serial ..........***Failed 51.83 sec
Start 16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel
16/20 Test #16: unit_evaluators_ScatterScalarResponse_UnitTest_Parallel ........***Failed 8.81 sec
Start 17: LandIce_FO_Dome_Ascii
17/20 Test #17: LandIce_FO_Dome_Ascii .......................................... Passed 11.40 sec
Start 18: LandIce_FO_Dome_Restart
18/20 Test #18: LandIce_FO_Dome_Restart ........................................ Passed 9.23 sec
Start 19: landIce_FO_AIS_16km_decompMesh
19/20 Test #19: landIce_FO_AIS_16km_decompMesh ................................. Passed 22.59 sec
Start 20: landIce_FO_AIS_16km_MueLuKokkos
20/20 Test #20: landIce_FO_AIS_16km_MueLuKokkos ................................ Passed 14.41 sec

80% tests passed, 4 tests failed out of 20

Label Time Summary:
Forward = 35.04 secproc (3 tests)
LandIce = 35.04 sec
proc (3 tests)
Serial = 20.63 secproc (2 tests)
unit = 463.64 sec
proc (16 tests)

Total Test time (real) = 521.31 sec

The following tests FAILED:
13 - unit_evaluators_ScatterResidual_UnitTest_Serial (Failed)
14 - unit_evaluators_ScatterResidual_UnitTest_Parallel (Failed)
15 - unit_evaluators_ScatterScalarResponse_UnitTest_Serial (Failed)
16 - unit_evaluators_ScatterScalarResponse_UnitTest_Parallel (Failed)
Errors while running CTest

@mperego
Copy link
Collaborator

mperego commented May 16, 2024

OK, I don't think that your changes are affecting the unit tests, so I think you are good to go.

In case you want to understand what's going on, you can run single tests with verbose output doing:
ctest -VV -R unit_evaluators_ScatterResidual_UnitTest_Serial

@OscarAntepara
Copy link
Contributor Author

If people are curious, I'm seeing the same errors that you have here: https://my.cdash.org/viewTest.php?buildid=2560428

@mperego
Copy link
Collaborator

mperego commented May 16, 2024

Thanks, we should open an issue about those tests failing.

@mcarlson801
Copy link
Collaborator

Thanks, we should open an issue about those tests failing.

Those tests are currently not expected to pass since these are uvm-free builds. If you want I can start an issue that tracks the status of uvm-free tests and which are currently known to fail.

@mperego
Copy link
Collaborator

mperego commented May 16, 2024

Thanks, we should open an issue about those tests failing.

Those tests are currently not expected to pass since these are uvm-free builds. If you want I can start an issue that tracks the status of uvm-free tests and which are currently known to fail.

Oh, OK. Should we disable them in UVM-free builds? Anyway, I'm fine either way, and it's OK to do nothing if you plan to make these tests work in UVM-free builds.

@jewatkins
Copy link
Collaborator

Frontier numbers from Oscar:
Original:

Residual: 4ms
Jacobian: 71ms

New:

Residual: 1ms
Jacobian: 44ms

@jewatkins jewatkins merged commit 0f02996 into sandialabs:master May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants