Version 2.2 Release #54
gdicker1
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Version 2.2 Functional Release
The EarthWorks Version 2.2 release introduces these new features:
Multi-Platform Support: Version 2.2 is the first EarthWorks release with multi platform support. We have added GH1, a Grace-Hopper system at the Texas Advanced Computing Center (TACC). GH1 consists of two Grace-Hopper NVIDIA nodes: one compute node and one login/compile node. The Grace node component is an ARM-V9 72-core processor, Hopper is an NVIDIA Tesla H-100 GPU. We expect a substantially larger multi-node test system called Vista to replace GH1. Some caveats for GH1 release include:
NSF NCAR’s Derecho supercomputer remains the principal system supported in the EarthWorks release.
Multi-Component GPU Offload: Version 2.2 is the first functional release of a multi-component GPU offload capability, including the MPAS dynamical core, PUMAS microphysics (pumas_cam-release_v1.36) and RRTMG-P radiative transfer physics code. The release comes with the following caveats:
Defining Compsets & Enabling
create_test
: The new approach provides a more “CESM-like” create, build, run test environment. This includes definitions of tests to be used with CIME’screate_test
workflow, adjustments to default values (coupling intervals and component timesteps), and definitions of some commonly used EarthWorks-specific compsets. These additions will make testing EarthWorks simpler in the future and will allow generation and comparison against baselines.The newly added compsets added include:
F2000climoEW
: An analogue to the F2000climo compset in CAM, but with the CICE prescribed mode swapped for the MPAS-SI prescribed mode.F2000devEW
: An analogue to the F2000dev compset in CAM, again with MPAS-SI prescribed mode instead of CICE.FullyCoupledEW
: A compset that has been mentioned in other releases, formalized here. It uses active MPAS components for the atmosphere, ocean, and seaice.CHAOS2000
: The Coupled Hexagonal Atmosphere, Ocean, and Seaice compset. Like FullyCoupledEW, but with an active river-runoff (MOSART) component as well.CHAOS2000dev
: Uses “cam_dev” physics by default instead of “CAM6” physics.The tests are defined for Derecho and grouped into the following categories:
ew-pr
: contains some tests that are expected to be run when creating a PR to try to catch bugs, reversions, or changes that may affect EarthWorks. These tests try to consume a low amount of core-hours, so they are not exhaustive. In this release they are 5 day “smoke tests” (forward run only), on 120km, for each supported compset, with various compilers.ew-ver
: contains tests that can be run to verify the correctness of EarthWorks (especially versus CESM). In this release the only test described is a 1200 day “smoke test” of FHS94 to match what’s described in https://www.cesm.ucar.edu/models/simple/held-suarez. This group will be expanded in future releases.ew-rel
: contains a broader range of test cases that the EarthWorks team expects to pass (along withew-pr
) before creating a release. In this release we tested the CHAOS2000dev compset using an 11 day “exact restart” test, for a few resolutions, and for both the Intel and NVHPC compilers. These are a starting point, and will be expanded in future releases.New Documentation: As we create more releases we hope to grow the community around EarthWorks. These documents help set some ground rules, start guiding potential contributors, and define the development practices already in place. These guides include:
Description of Model Configurations (Compsets)
See EarthWorks Supported Configurations in the GitHub wiki for more details.
Testing
Tested Systems
NSF NCAR’s Derecho Supercomputer
The majority of tests occurred on Derecho.
CPU-only hardware Derecho’s CPU-only nodes consisted of dual-socket, 64-core, 3rd Gen AMD EPYC™ 7763 Milan processors with 256 GB of DDR4 memory.
CPU/GPU hybrid hardware Derecho has GPU nodes consisting of single-socket, 64-core, 3rd Gen AMD EPYC™ 7763 Milan processor with 512 GB of DDR4 memory plus 4 NVIDIA A100 GPUs each with 40 GB of onboard memory.
TACC’s GH1 Test System
CPU/GPU hybrid hardware GH1 has 1 login/compile node and 1 compute node with the same hardware on each. The Grace (CPU) component is an ARM-V9 72-core processor, the Hopper component is an NVIDIA Tesla H-100 GPU.
Tested Software Stacks
Compiler Versions
Derecho:
GH1:
Libraries
Derecho:
GH1:
Testing Results
Derecho
create_test
ResultsTo test this release, CPU-only tests were carried out on Derecho using the
ew-pr
andew-rel
categories as described above.5 Day Smoke Tests (
ew-pr
)ERROR: No archive entry found for components: ['ICE', 'OCN']
11 Day Exact Restart Tests (
ew-rel
)Infinity
andNaN
values in arrayinvrs_tau_xp2_zm
and “Error calling advance_xp2_xpyp
”.Known Issues
Known issues by compset/compiler/resolution (CPU-only):
See the Derecho
create_test
Results aboveKnown issues by compset/compiler/resolution (Hybrid CPU-GPU)
xmlchange
during the setup of a case. E.g. for a case just created, use this command to request rrtmgp_gpu and set a valid PCOLS value: `./xmlchange --append CAM_CONFIG_OPTS="-rad rrtmgp_gpu -pcols 2048"This discussion was created from the release Version 2.2 Release.
Beta Was this translation helpful? Give feedback.
All reactions