Skip to content

[Terraform] Image creation fails after 1 hour #6894

Open
@mfontana-elem

Description

@mfontana-elem

ParallelCluster version: v3.11.0
aws-parallelcluster version: v1.1.0

When trying to build an image via Terraform, after exactly 1 hour, the creation on the Terraform side fails. I have verified this 3 times. If I read the relevant CloudWatch logs, the build is still executing at the moment of Terraform failure, and after more time, it finishes correctly. However, the resource is left in a tainted state on Terraform side, forbidding IaC management.

Relevant Terraform output:

module.images.module.images["hpc6a.48xlarge"].aws-parallelcluster_image.main: Still creating... [1h0m1s elapsed]
╷
│ Error: Image create failed to complete.
│ 
│   with module.images.module.images["hpc6a.48xlarge"].aws-parallelcluster_image.main,
│   on ../../../modules/common/images/image/image.tf line 26, in resource "aws-parallelcluster_image" "main":
│   26: resource "aws-parallelcluster_image" "main" {
│ 
│ Error: 403 Forbidden

My first reflex was to think there was a 1 hour timeout on the Terraform resource code, so I created #6891. However, while inspecting the code, I realized it SHOULD have a 3 hours timeout.

Hence, my only suspect is that an assumed role is reaching its max duration or something like that. How can I go and try to debug this problem?

Best regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions