-
I'm experiencing unexpected behavior of workflows that are created based on cronworkflows, to be specific I've noticed that it happens with all workflows labeled with Scenario1: Workflow starts, pod is being created and it finishes, so it this case everything worked as expected. Scenario2:
Status in workflow object is not updated and it looks like this:
Phase is not visible in Status and "Started At" stays empty, event WorkflowRunning appears but pod is not created and nothing happens for 30-40s And after 30-40s I see something like this in controller logs:
And Status changes in workflow object:
After that pod is created and rest goes as it should. Scenario3:
Status in workflow object is not updated and it looks like this:
Nothing more happens and workflow never starts. I don't know go and don't understand insides of controller enough continue debugging, this only happens to one of my argo deployments in one cluster(only argo on this cluster) and I wasn't able to reproduce this on other clusters and it makes me think that this is caused of some unexpected behavior of apiserver. Scenario2 appeared only after kubernetes upgrade form 1.18 to 1.19, before that is was only scenario 1 or 3. I'm using EKS, tested on versions 1.18, 1.19 and 1.20. Anybody know what might be cause of this behavior? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
I'll add that I'm running similar argo worklows setups on 2 EKS clusters and the issue appears only on one of them. |
Beta Was this translation helpful? Give feedback.
-
@sarabala1979 Do you have any quick hints here? |
Beta Was this translation helpful? Give feedback.
-
It looks like your k8s api server delaying the request or dropping? Can you check k8s api server logs for pods creation and workflow update?
…Sent from my iPhone
On Oct 28, 2021, at 7:09 AM, Jakub Bielawski ***@***.***> wrote:
@sarabala1979 any other ideas?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
Beta Was this translation helpful? Give feedback.
-
@sarabala1979 Hi, took me a while. I was looking at api server metrics and didn't noticed anything alarming. No errors in workflow read or write requests, same for pods. Reads for workflows can take up to 650ms, writes tops at 125ms. For pods I've seen reads taking up to 25s but this periods do not correlate with the times when I was trying to run cronworkflows, reads take up to 220ms. I've done another test on my current setup: When I went thru eks apiserver audit logs I've seen the following requests related to wf I was trying to run: From that point workflow executes normally. I've noticed that controller every minute send "list" request to api: Other thing I've noticed, I believe this is not so important but I see that sometimes controller uses "update" and on other times "path" requests and I don't know what's the difference here and why two typed of requests are being used. |
Beta Was this translation helpful? Give feedback.
-
@sarabala1979 Ok, I've found the cause of this problem. It's in workflow/controller/estimation/estimator_factory.go:
This part of code depends on |
Beta Was this translation helpful? Give feedback.
@sarabala1979 Ok, I've found the cause of this problem. It's in workflow/controller/estimation/estimator_factory.go:
This part of code depends on
w…