diff --git a/docs/aurora/running-jobs-aurora.md b/docs/aurora/running-jobs-aurora.md index 845314453b..fab61b6898 100644 --- a/docs/aurora/running-jobs-aurora.md +++ b/docs/aurora/running-jobs-aurora.md @@ -9,9 +9,9 @@ There are four production queues you can target in your qsub (`-q `) |---------------|----------|----------|----------|----------|------------------------------------------------------------------------------------------------------| | debug | 1 | 2 | 5 min | 1 hr | 64 nodes (non-exclusive);
Max 1 job running/accruing/queued **per-user** | | debug-scaling | 2 | 31 | 5 min | 1 hr | Max 1 job running/accruing/queued **per-user** -| prod | 1 | 8211-8576* | 5 min | 24 hrs | Routing queue for tiny, small, medium, and large queues;
**See table below for min/max limits**| -| prod-large | 1920 | 8211-8576* | 5 min | 24 hrs | Routing queue for large jobs; -| legacy | 1 | 2048-2413 | 5 min | 24 hrs | Legacy routing queue with old (prior to 10/13/25) AuroraSDK. Some projects have higher priority. +| prod | 1 | 8498* | 5 min | 24 hrs | Routing queue for tiny, small, medium, and large queues;
**See table below for min/max limits**| +| prod-large | 1920 | 8498* | 5 min | 24 hrs | Routing queue for large jobs; +| legacy | 1 | 2126 | 5 min | 24 hrs | Legacy routing queue with old (prior to 10/13/25) AuroraSDK. Some projects have higher priority. | visualization | 1 | 32 | 5 min | 8 hrs | ***By request only; non-exclusive nodes*** | @@ -22,11 +22,11 @@ There are four production queues you can target in your qsub (`-q `) | tiny | 1 | 512 | 5 min | 6 hrs | | | small | 513 | 1024 | 5 min | 12 hrs | | | medium | 1025 | 1919 | 5 min | 18 hrs | | -| large | 1920 | 8211-8576* | 5 min | 24 hrs | range for stable max nodecount until new image is deployed across all the nodes +| large | 1920 | 8498* | 5 min | 24 hrs | theoretical max; stable max nodecount may vary | backfill-tiny | 1 | 512 | 5 min | 6 hrs | Low priority, negative project balance | | backfill-small | 513 | 1024 | 5 min | 12 hrs | Low priority, negative project balance | | backfill-medium | 1025 | 1919 | 5 min | 18 hrs | Low priority, negative project balance | -| backfill-large | 1920 | 7300-7500* | 5 min | 24 hrs | Low priority, negative project balance; range for stable max nodecount until new image is deployed across all the nodes | +| backfill-large | 1920 | 8498* | 5 min | 24 hrs | Low priority, negative project balance; theoretical max; stable max nodecount may vary | !!! warning diff --git a/docs/aurora/system-updates.md b/docs/aurora/system-updates.md index 7638ccb89f..676094086b 100644 --- a/docs/aurora/system-updates.md +++ b/docs/aurora/system-updates.md @@ -1,5 +1,13 @@ # Aurora System Updates +## 2025-10-13 +The compute image with Intel's User (UMD) and Kernel Mode Drivers (KMD) (Agama 1146.12 / rolling release 2523.12), and oneAPI 2025.2.0, which was previously available in next-eval queue, is rolled out to the majority of nodes across Aurora. 2,126 nodes have the old production image and are available in a queue called `legacy`, which will be available to all teams that are unable to run against the new image. Some teams will have higher priority to run in the legacy queue. Use aurora-uan-000[7-8] nodes for the `legacy` queue as they will have the same user environment. Users will not be able log in directly to aurora-uan-000[7-8] and will need to ssh to them after logging in to aurora.alcf.anl.gov. + +See https://docs.alcf.anl.gov/aurora/running-jobs-aurora/ + +See sections ["2025-10-07"](system-updates.md/#2025-10-07) and ["2025-09-08"](system-updates.md/#2025-09-08) below for all the change log details. + + ## 2025-10-07 The image in `next-eval` queue, and uan-0014, has been updated to AuroraSDK version 25.190.0 RC4, with the following changes: