Skip to content

fail to run cml-launch #774

@yili-han-86

Description

@yili-han-86

I have a github workflow, partial of the code is like this:

deploy-runner:
    runs-on: ubuntu-22.04
    container:
      image: docker://iterativeai/cml:0-dvc3-base1
    steps:
      - uses: actions/checkout@v4
      - name: Deploy runner on cloud service
        shell: bash {0}
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS_DATA }}
        run: |
          IFS=',' read -r -a REGION_ARRAY <<< "$(echo $CLOUD_REGIONS | sed 's/ *, */,/g' )"
          SUCCESS=false
          for REGION in "${REGION_ARRAY[@]}"; do
            echo "Trying region $REGION"
            cml runner launch \
              $(if $CLOUD_SPOT; then echo "--cloud-spot"; fi) \
              --cloud=$CLOUD_SERVICE \
              --cloud-permission-set=$CLOUD_SERVICE_ACCOUNT,scopes=cloud-platform \
              --cloud-region=$REGION \
              --cloud-type=$CLOUD_TYPE \
              --cloud-hdd-size=$CLOUD_HDD_SIZE \
              --labels=eval-and-test-${{ inputs.branch }}-${{ inputs.model_env }} \
              --idle-timeout=$CLOUD_IDLE_TIMEOUT
            if [ $? -eq 0 ]; then
              echo "Successful in $REGION"
              SUCCESS=true
              break
            else
              echo "$REGION failed"
            fi
          done
          if [ "$SUCCESS" = false ]; then
            echo "All regions failed"
            exit 1
          fi

I kept facing this error:

***"level":"info","message":"iterative_cml_runner.runner: Creation errored after 19m53s"***
***"level":"error","message":"terraform error: Error: Error checking the runner status"***
2025-06-10T01:32:08.074Z [INFO]  provider: configuring client automatic mTLS
2025-06-10T01:32:08.089Z [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative args=[".terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative"]
2025-06-10T01:32:08.089Z [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative pid=166
2025-06-10T01:32:08.089Z [DEBUG] provider: waiting for RPC address: plugin=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative
2025-06-10T01:32:08.113Z [INFO]  provider.terraform-provider-iterative: configuring server automatic mTLS: timestamp=2025-06-10T01:32:08.113Z
2025-06-10T01:32:08.135Z [DEBUG] provider.terraform-provider-iterative: plugin address: address=/tmp/plugin2905256425 network=unix timestamp=2025-06-10T01:32:08.135Z
2025-06-10T01:32:08.135Z [DEBUG] provider: using plugin: version=5
2025-06-10T01:32:08.150Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2025-06-10T01:32:08.152Z [INFO]  provider: plugin process exited: plugin=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative id=166
2025-06-10T01:32:08.152Z [DEBUG] provider: plugin exited
2025-06-10T01:32:08.152Z [INFO]  provider: configuring client automatic mTLS
2025-06-10T01:32:08.159Z [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative args=[".terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative"]
2025-06-10T01:32:08.159Z [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative pid=[174](https://github.com/presien/training-pipeline/actions/runs/15548455626/job/43774361074#step:4:175)
2025-06-10T01:32:08.159Z [DEBUG] provider: waiting for RPC address: plugin=.terraform/providers/registry.terraform.io/iterative/iterative/0.11.20/linux_amd64/terraform-provider-iterative
2025-06-10T01:32:08.186Z [INFO]  provider.terraform-provider-iterative: configuring server automatic mTLS: timestamp=2025-06-10T01:32:08.186Z
2025-06-10T01:32:08.206Z [DEBUG] provider.terraform-provider-iterative: plugin address: address=/tmp/plugin363149033 network=unix timestamp=2025-06-10T01:32:08.206Z
2025-06-10T01:32:08.206Z [DEBUG] provider: using plugin: version=5
2025-06-10T01:32:08.222Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"

each dependencies version is below:
Terraform v1.9.8
cml 0.20.6
terraform provider version: 0.11.20

I followed the suggestions here(iterative/cml#1479) to change from ubuntu-latest to ubuntu-22.04, the error still exist.

how to fix the issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions