- 
                Notifications
    
You must be signed in to change notification settings  - Fork 6
 
Description
Our derecho testing currently uses dedicated 128 core nodes for each test run. This is overkill for our needs, where 8 cores should be sufficient.
I tried changing to use the shared 8-core develop nodes with these diffs:
diff --git a/config/derecho.yaml b/config/derecho.yaml
index 7fc0cf2..038533c 100644
--- a/config/derecho.yaml
+++ b/config/derecho.yaml
@@ -1,11 +1,12 @@
 machine:
   name: derecho
- cores_per_node: 128
+ # Use just 8 cores on the develop queue for less resource wastage (the develop queue allows shared nodes) and potentially less queue wait time
+ cores_per_node: 8
  head_node_name: derecho6
  scheduler:
  type: pbs
  account: p93300606
- queue: main
+ queue: develop
  partition: None
 
 test:This led to a few test failures:
gfortran-pio2.5.10-O: failure in ESMF_ArrayRedistPerfUTest.F90: slightly exceeds tolerance (1.64e-2 > 1e-2)
gfortran-pio2.5.10-g:
CRASHED: mpi/g: src/Superstructure/Component/tests/ESMF_CompTunnelUTest.F90
   FAIL: mpi/g:   Check time delay on timeout ServiceLoop Actual Component E, ESMF_CompTunnelUTest.F90, line 1381:  Incorrect time delay
   FAIL: mpi/g:   ServiceLoop for the Actual Component E - timeoutFlag, ESMF_CompTunnelUTest.F90, line 1407:  Did not return ESMF_SUCCESS
   FAIL: mpi/g:   Testing timeoutFlag, ESMF_CompTunnelUTest.F90, line 1414:  timeoutFlag wrong
In addition, there were a number of other issues with failures to report / commit to artifacts repo, but at least some of these may have been due to issues with the derecho qstat command (see #104).
I'd like to try this again to see if we can get testing to work reliably from the develop queue, since this should greatly decrease the cost incurred with our nightly tests.