Skip to content

(3.8.0 ‐ 3.13.1) Update‐Cluster, Update‐Compute‐Fleet may fail when Compute Resources use an expired Capacity Reservation

Xuanqi He edited this page Jun 13, 2025 · 3 revisions

The issue

The following operations may fail:

  • pcluster update-cluster
  • pcluster update-compute-fleet

With the error message

Unable to parse configuration file. An error occurred when calling the DescribeCapacityReservations operation: The capacity reservation ID 'cr-xxxxx' was not found

When

  1. Current cluster configuration includes a ComputeResources entry with a CapacityReservationId
  2. The specified Capacity Reservation has expired
  3. No InstanceType is specified within the same ComputeResources entry

For example

Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: string
      ComputeResources:
        - Name: string
          MinCount: integer
          MaxCount: integer
          # InstanceType is missing
          CapacityReservationTarget:
            # The Capacity Reservation below is expired or cancelled
            CapacityReservationId: cr-01234567890abcdef

Affected ParallelCluster versions, OSes and schedulers

All ParallelCluster versions from 3.8.0 to 3.13.1 with the Slurm scheduler on all OSes.

Mitigation

The following steps should be followed to resolve this issue:

  1. Download the patched source code from the integ-tests-<VERSION> branch on the aws-parallelcluster github repo
    1. For example, if you are using pcluster version 3.13.0, use the branch integ-tests-3.13.0
    2. Run the following command to download the source code
      wget https://github.com/aws/aws-parallelcluster/archive/integ-tests-3.13.0.zip
      
  2. Install the new CLI from the source code
    1. Run the following command to install from source:
      tar -xzf integ-tests-<VERSION>.zip
      cd aws-parallelcluster-integ-tests-<VERSION>
      pip install ./cli
      
Clone this wiki locally