Skip to content

Commit 5788866

Browse files
jemcrdhar
andauthored
fix: job id race condition with large, dynamic matrices (#451)
* fix: job id race condition with large, dynamic matrices When you execute a matrix job, and the matrix is dynamic (i.e. based on the output of a previous job; for example, a change detection job that looks for workspaces that had changes), then GitHub won't immediately show the full job list for the current workflow run. You can observe this in the GitHub UI, where the matrix slowly gets more job instances, even as the matrix jobs instances are starting. Unfortunately, even after a matrix job instance has started executing, it may not be visible in the UI or API yet. This means that the API call which TF-via-PR makes to get the job id is not guaranteed to succeed, and in practice it will reliably fail if the dynamic matrix is large enough (e.g. 50 instances), and if the `identifier` step of the `TF-via-PR` action is reached quickly enough. This PR adds a workaround for that issue, where the API call will be retried with exponential backoff, up to a maximum limit of attempts. In practice, this should avoid the race condition without introducing too much complexity, despite being a bit inelegant. * Comment --------- Co-authored-by: Rishav Dhar <19497993+rdhar@users.noreply.github.com>
1 parent e85a611 commit 5788866

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

action.yml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,19 @@ runs:
108108
# For matrix jobs, join the matrix values with comma separator into a single string and get the ID of the job which contains it.
109109
matrix=$(echo "$GH_MATRIX" | jq --raw-output 'to_entries | map(if .value | type == "object" then (.value | to_entries[0].value) else .value end) | join(", ")')
110110
job_id=$(echo "$workflow_run" | jq --raw-output --arg matrix "$matrix" '.jobs[] | select(.name | contains($matrix)) | .id' | tail -n 1)
111+
# For dynamic matrix jobs, retry with exponential backoff until the job ID is found or a timeout occurs.
112+
retry_interval=1
113+
while [[ -z "$job_id" ]]; do
114+
if [[ $retry_interval -gt 64 ]]; then
115+
echo "Unable to locate job ID for matrix: $matrix."
116+
exit 1
117+
fi
118+
echo "Waiting to locate job ID; will try again in $retry_interval seconds."
119+
sleep "$retry_interval"
120+
retry_interval=$((retry_interval * 2))
121+
workflow_run=$(gh api /repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/attempts/${{ github.run_attempt }}/jobs --header "$GH_API" --method GET --field per_page=100)
122+
job_id=$(echo "$workflow_run" | jq --raw-output --arg matrix "$matrix" '.jobs[] | select(.name | contains($matrix)) | .id' | tail -n 1)
123+
done
111124
fi
112125
echo "job=$job_id" >> "$GITHUB_OUTPUT"
113126
@@ -172,7 +185,7 @@ runs:
172185
# Download plan file.
173186
# Get the artifact ID of the latest matching plan files for download.
174187
artifact_id=$(gh api /repos/${{ github.repository }}/actions/artifacts --header "$GH_API" --method GET --field "name=${{ steps.identifier.outputs.name }}" --jq '.artifacts[0].id' 2>/dev/null)
175-
if [ -z "$artifact_id" ]; then echo "Unable to locate plan file: ${{ steps.identifier.outputs.name }}." && exit 1; fi
188+
if [[ -z "$artifact_id" ]]; then echo "Unable to locate plan file: ${{ steps.identifier.outputs.name }}." && exit 1; fi
176189
gh api /repos/${{ github.repository }}/actions/artifacts/${artifact_id}/zip --header "$GH_API" --method GET > "${{ steps.identifier.outputs.name }}.zip"
177190
178191
# Unzip the plan file to the working directory, then clean up the zip file.

0 commit comments

Comments
 (0)