Skip to content

airflow tends to zombie tasks that should be successful  #614

Closed
@maxgruber19

Description

@maxgruber19

system versions: 24.11.0 / 2.9.3

we observed some issues with airflow running with celery executors setting some tasks to "zombie" which ran successfully. some occasions seem to correlate with 20+ dags submitted at once ( most of them are of @daily schedule) but even in case of one dag running alone it's happening.

we increased the pod memory of scheduler and workers to 8 gi (maybe thats still ways to low?) but its still an issue. I did that according to the recommendation in the error message pasted below.

log of an affected task below, the task should be listed as successfull because all the underlying steps have been completed successfully as well.

That issue rather is more a question / request for airflow experience than a typical issue / bug report

I guess you will need further details to tell us more, so please let me know what logs / stats you need to help me 😄

airflow-worker-default-1.airflow-worker-default.mesh-platform-core.svc.cluster.local 
*** Found logs in s3: 
***   * s3://BUCKETNAME/logs/dag_id=protrans/run_id=manual__2025-04-22T13:12:23.151597+00:00/task_id=load/attempt=1.log.SchedulerJob.log 
[2025-04-22, 15:25:33 CEST] {sched.py:151} ERROR - Detected zombie job: {'full_filepath': '/stackable/app/git/current/stages/int/apps/product-protrans/dags/protrans.py', 'processor_subdir': '/stackable/app/git/.worktrees/181cc3caac63f51937ffdbf7851137d5f0fd0b49/stages/int/apps', 'msg': "{'DAG Id': 'protrans', 'Task Id': 'load', 'Run Id': 'manual__2025-04-22T13:12:23.151597+00:00', 'Hostname': 'airflow-worker-default-1.airflow-worker-default.mesh-platform-core.svc.cluster.local', 'External Executor Id': '87a1cea8-0e6f-47f8-9d0f-bbb6bac1c214'}", 'simple_task_instance': <airflow.models.taskinstance.SimpleTaskInstance object at 0x7fbd99eec760>, 'is_failure_callback': True} (See https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html#zombie-undead-tasks)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions