-
Notifications
You must be signed in to change notification settings - Fork 287
Keel container fails liveness probes, causing crash loop #790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@DrJosh9000running the exact same versions here without issues. Can you try to manually remove the liveness probe in the deployment so that the pod keeps running, and then port forward 9300 to locally test the probe? You should be getting something like: Also try enabling the admin UI and see if you can connect to it (sample values.yaml for the helm chart):
It looks to me more like a networking issue, but I can't think of a simple explanation for it. Networking can get as complex as you want. |
I tried experimenting again. Enabling the admin UI didn't seem to be enough, but I noticed that it was hitting resource limits. Bumping resources:
limits:
cpu: 1000m
memory: 256Mi
requests:
cpu: 500m
memory: 128Mi With these limits it used around 650m CPU starting up, and nearly all 128Mi of the request. I started looking for the reason why, and that's when I noticed that keel container image is only built for |
I've experienced the same issue, and resolved it by disabling the helm provider - I suspect there's a memory leak within the helm provider (or possibly I'm running an unsupported config that the helm provider silently chokes on). Without helm provider, memory usage is steady at around 22% of the default limit, whereas before it wanted over the default limit. |
What happens
When I install Keel using the latest Helm chart, it is repeatedly killed by kubelet because it does not respond to liveness probes (and also fails readiness probes).
Pod events:
Container log:
With debug logging enabled:
How to replicate
I ran the following:
Other notes
To check whether this could be caused by mystery broken networking on my node/cluster, I ran an nginx deployment with 1 replica and a similar liveness config as the Helm chart configures for Keel (probing
/
), and it started and ran successfully.The text was updated successfully, but these errors were encountered: