-
-
Notifications
You must be signed in to change notification settings - Fork 2
airflow tends to zombie tasks that should be successful #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @maxgruber19 : we had to increase the memory for airflow 2.10.4 (ok, so not 2.9.3) for our demo, even though the release notes didn't highlight anything that might require this, so OOMs could be a possible cause. And that was the webservers, which was a little unexpected. If this happens regularly enough, you can try and catch it with something like: |
Are you using gitsync to fetch your DAGs, by any chance? |
@adwk67 thanks once again, I'll try increasing resources especially for the webservers, will let you know about the outcome Yes, all dags come via gitsync |
Depending on the size/scale of the DAGs, this can cause issues: we have an open issue here. If this seems to be the problem and it can be overcome with the overrides described in that issue, do let me know and we can prioritize that issue accordingly. |
system versions: 24.11.0 / 2.9.3
we observed some issues with airflow running with celery executors setting some tasks to "zombie" which ran successfully. some occasions seem to correlate with 20+ dags submitted at once ( most of them are of @daily schedule) but even in case of one dag running alone it's happening.
we increased the pod memory of scheduler and workers to 8 gi (maybe thats still ways to low?) but its still an issue. I did that according to the recommendation in the error message pasted below.
log of an affected task below, the task should be listed as successfull because all the underlying steps have been completed successfully as well.
That issue rather is more a question / request for airflow experience than a typical issue / bug report
I guess you will need further details to tell us more, so please let me know what logs / stats you need to help me 😄
The text was updated successfully, but these errors were encountered: