Skip to content

stop podman kube play --wait from using 100% CPU #24381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

t4chib4ne
Copy link
Contributor

@t4chib4ne t4chib4ne commented Oct 26, 2024

When running podman kube play --wait pod.yaml the podman process invoked on the commandline will use 100% of a single CPU waiting for the pod to exit.
This can be reproduced with the following file and then opening another terminal and observing the CPU load of the process.

# https://kubernetes.io/docs/concepts/workloads/pods/
apiVersion: v1
kind: Pod
metadata:
  name: "high-cpu-load"
  namespace: default
  labels:
    app: "high-cpu-load"
spec:
  containers:
  - name: high-cpu-load
    image: docker.io/library/debian:bookworm-slim
    command: ["bash"]
    args: ["-c", "handler() { exit 0; } ; trap handler TERM; sleep infinity& wait"]
  restartPolicy: Never

This is fixed by adding a proper Interval value to the options given to ContainerWait. Previously the default value of time.Duration was used which is 0 thus creating a busy-wait.

I have set it to 250ms similarly how it was done here #18149
Likewise I don't know if there are additional tests needed for this PR.

Does this PR introduce a user-facing change?

podman kube play --wait will no longer busy poll which resulted excessive cpu usage.

@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Oct 26, 2024
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like in order to prevent more of such bugs it would be much better to set a default pollInterval in func (c *Container) WaitForExit(), i.e.

if pollInterval <= 0 {
	pollInterval = DefaultWaitInterval
}

@t4chib4ne
Copy link
Contributor Author

Yes, I agree.
As a side note: I checked other functions calling Container.WaitForExit and it looks like (no 100% sure) there are no more bugs like this one except for the one from this PR. Other functions supply their own default value of 250ms.

I would like to add your suggestion by reverting my commit, adding the code and adding a comment to Container.WaitForExit pointing out the default value. Is that OK? Do I need to add tests or are the current ones sufficient?
(Sorry for asking but I am pretty new to doing PRs)

@Luap99
Copy link
Member

Luap99 commented Oct 29, 2024

I would like to add your suggestion by reverting my commit, adding the code and adding a comment to Container.WaitForExit pointing out the default value. Is that OK? Do I need to add tests or are the current ones sufficient?

Yes you can drop your commit and then add a new one with the default logic. Reverting (in terms of git revert command) makes no sense in a PR context because that adds a second commit which undo the changes so they still end up being in the git history.

I mark this as no tests needed, given the current functionality is already in some tests and pass it will be pretty hard to add a solid automatic test into our CI system. We would need to track CPU usage of the process which will be unreliable in CI systems and not something we have done before in CI tests AFAIK.

@Luap99 Luap99 added the No New Tests Allow PR to proceed without adding regression tests label Oct 29, 2024
@openshift-ci openshift-ci bot added release-note and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Oct 29, 2024
@t4chib4ne t4chib4ne closed this Oct 30, 2024
@t4chib4ne t4chib4ne force-pushed the kube-play-wait-interval branch from ddff064 to 2f6fca6 Compare October 30, 2024 18:53
@t4chib4ne t4chib4ne reopened this Oct 30, 2024
Signed-off-by: Maximilian Hueter <maximilian.hueter@icloud.com>
@t4chib4ne t4chib4ne force-pushed the kube-play-wait-interval branch from 4f02cd3 to 314dece Compare October 30, 2024 19:01
@rhatdan
Copy link
Member

rhatdan commented Oct 30, 2024

/approve
LGTM
Thanks @t4chib4ne

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2024
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 1, 2024
Copy link
Contributor

openshift-ci bot commented Nov 1, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, rhatdan, t4chib4ne

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 496e7ca into containers:main Nov 1, 2024
77 checks passed
@t4chib4ne t4chib4ne deleted the kube-play-wait-interval branch November 9, 2024 15:25
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Feb 8, 2025
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Feb 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. No New Tests Allow PR to proceed without adding regression tests release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants