Skip to content

kube SIGINT system test: fix race in timeout handling #24496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 7, 2024

Conversation

edsantiago
Copy link
Member

Up to now this test has been run using:

PODMAN_TIMEOUT=2 run_podman kube play ...

...and this gives podman time to start the pod before getting
the signal.

When run in parallel, under heavy load, the above command seems
to time out before podman has gotten its act together. Weird
things happen, like weird exit status and (most crucially)
zombie containers.

Solution: wait for container to actually start before we kill it.

Signed-off-by: Ed Santiago santiago@redhat.com

None

Up to now this test has been run using:

    PODMAN_TIMEOUT=2 run_podman kube play ...

...and this gives podman time to start the pod before getting
the signal.

When run in parallel, under heavy load, the above command seems
to time out before podman has gotten its act together. Weird
things happen, like weird exit status and (most crucially)
zombie containers.

Solution: wait for container to actually start before we kill it.

Signed-off-by: Ed Santiago <santiago@redhat.com>
@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 7, 2024
@edsantiago
Copy link
Member Author

cherrypicked from #23275 where it has been working flawlessly for weeks

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Nov 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mheon
Copy link
Member

mheon commented Nov 7, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit b109a2b into containers:main Nov 7, 2024
51 of 53 checks passed
@edsantiago edsantiago deleted the sigint-flake branch November 11, 2024 13:13
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Feb 10, 2025
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Feb 10, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants