way to make an "all_done" task impervious to "Mark Failed" with "Downstream" selected #21242

stephenonethree · 2021-09-02T15:38:53Z

stephenonethree
Sep 2, 2021

Description

I have a cleanup task that I want to run at the end of a DAG no matter what. To attempt to do this, I am putting it at the end with trigger_rule='all_done' like so:

always_run = DummyOperator(
    task_id='always_run',
    trigger_rule='all_done',
    dag=dag
)
penultimate_task >> always_run

However, I think that this might not be completely bulletproof. Say upstream there are some running tasks and one of them fails but others are still running, and the failure was because of a bug in code. Somebody might decide to halt all job execution so they can fix their code. They might do that by clicking a task and clicking Mark Failed with Downstream selected. Based on testing, this will cause my always_run task to not run, because it gets marked as failed.

In a sense the spirit of all_done gets violated here.

In practice I think this can be a pretty big limitation, because the user behavior mentioned above may be common. People do it because it's an easy way to stop the job in the middle, fix some stuff, and restart the job by clearing statuses recursively later. That's fine until you consider that there are still some cleanup tasks that you might want to run in this situation, like if you spun up some VMs that should be taken down, etc. In my case I am reserving Flex Slots capacity in BigQuery and it's important to always take it down because it is very expensive.

It seems like there are two possible solutions:

Change "Mark Failed" behavior or have some other button like "Simulate Failed" that simulates real error conditions where things like "all_done" and "one_failed" etc get run like normal.
Allow tasks themselves to be marked as impervious to "Mark Failed Downstream."

Of course, I think what this is teaching me is that the recursive downstream markings can be dangerous, and maybe our team just needs to adopt a practice of marking only individual tasks as failed one-by-one. If you do that, there's no problem.

Use case/motivation

See above

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

collinmcnulty · 2021-09-03T20:40:20Z

collinmcnulty
Sep 3, 2021

You allude to this at the end; wouldn't you get the exact behavior you want from just marking the single task failed? Then all the tasks that aren't all_done will get marked upstream_failed but your all_done task won't be affected, right?

0 replies

stephenonethree · 2021-09-03T20:59:20Z

stephenonethree
Sep 3, 2021
Author

Yeah, that's correct. My concern though is that I think marking things as downstream failed may be one of the most common ways to clear things: In practice our DAGs often have like 50 parallel tasks, and these are the tasks that we most commonly want to mark failed. Rather than sitting there and marking each one of them as failed one-by-one, we save time by choosing the task above it and marking everything downstream of it as failed. But what I started realizing once I got into using "all_done" was how dangerous of a behavior that is, which is why I'm now looking for ways to make it less dangerous.

I suppose another possible solution to a dilemma like this might be the ability to vertically mark every one of those parallel tasks as failed so that there's less reason to develop a bad habit of using downstream failed.

Hope the motivations make sense...

0 replies

collinmcnulty · 2021-09-04T21:43:22Z

collinmcnulty
Sep 4, 2021

Your motivations do make sense, I wonder if it would be a better fit for your use case if both:
a) you made all of the parallel tasks that you would want to rerun together into a task group
b) there was a UI option to mark all tasks in a task group success/failed

Your strategy of going one task upstream and doing mark failed downstream seems like a "hack" for doing this, correct?

0 replies

stephenonethree · 2021-09-04T21:48:30Z

stephenonethree
Sep 4, 2021
Author

Yup it’s a hack. I like your proposed solution.

…

On Sat, Sep 4, 2021 at 5:43 PM Collin McNulty ***@***.***> wrote: Your motivations do make sense, I wonder if it would be a better fit for your use case if a) you made all of the parallel tasks that you would want to rerun together into a task group and b) if there was a UI option to mark all tasks in a task group success/failed. Your strategy of going one task upstream and doing mark failed downstream seems like a "hack" for doing this, correct? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#17992 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AP4PNAGAA7MIZS7WXW5NEU3UAKHIJANCNFSM5DJM4XLA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

0 replies

eladkal · 2022-01-31T21:20:29Z

eladkal
Jan 31, 2022
Collaborator

Could this really ever be solved? you are looking for a way to force your way over someone else that is choosing to ignore your all_done instructions and prevent the task to be executed. Say that we implement what you ask - what's preventing asking for a feature that will override your override?

may I also point that someone can set your DAG to OFF during execution which results in the last all_done task not to be executed.

My take on such issues is that - when one manually changes stuff one must be aware what he does and how it effect his process.

You can use doc.md to explain important notes on your DAG. I think this is the best approach here.

0 replies

stephenonethree · 2022-02-14T15:32:06Z

stephenonethree
Feb 14, 2022
Author

I think that #18097 basically supersedes this and is a better idea.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

way to make an "all_done" task impervious to "Mark Failed" with "Downstream" selected #21242

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

way to make an "all_done" task impervious to "Mark Failed" with "Downstream" selected #21242

Uh oh!

stephenonethree Sep 2, 2021

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

Replies: 6 comments

Uh oh!

collinmcnulty Sep 3, 2021

Uh oh!

stephenonethree Sep 3, 2021 Author

Uh oh!

Uh oh!

collinmcnulty Sep 4, 2021

Uh oh!

stephenonethree Sep 4, 2021 Author

Uh oh!

eladkal Jan 31, 2022 Collaborator

Uh oh!

stephenonethree Feb 14, 2022 Author

stephenonethree
Sep 2, 2021

collinmcnulty
Sep 3, 2021

stephenonethree
Sep 3, 2021
Author

collinmcnulty
Sep 4, 2021

stephenonethree
Sep 4, 2021
Author

eladkal
Jan 31, 2022
Collaborator

stephenonethree
Feb 14, 2022
Author