way to make an "all_done" task impervious to "Mark Failed" with "Downstream" selected #21242
Replies: 6 comments
-
You allude to this at the end; wouldn't you get the exact behavior you want from just marking the single task failed? Then all the tasks that aren't |
Beta Was this translation helpful? Give feedback.
-
Yeah, that's correct. My concern though is that I think marking things as downstream failed may be one of the most common ways to clear things: In practice our DAGs often have like 50 parallel tasks, and these are the tasks that we most commonly want to mark failed. Rather than sitting there and marking each one of them as failed one-by-one, we save time by choosing the task above it and marking everything downstream of it as failed. But what I started realizing once I got into using "all_done" was how dangerous of a behavior that is, which is why I'm now looking for ways to make it less dangerous. I suppose another possible solution to a dilemma like this might be the ability to vertically mark every one of those parallel tasks as failed so that there's less reason to develop a bad habit of using downstream failed. Hope the motivations make sense... |
Beta Was this translation helpful? Give feedback.
-
Your motivations do make sense, I wonder if it would be a better fit for your use case if both: Your strategy of going one task upstream and doing mark failed downstream seems like a "hack" for doing this, correct? |
Beta Was this translation helpful? Give feedback.
-
Yup it’s a hack. I like your proposed solution.
…On Sat, Sep 4, 2021 at 5:43 PM Collin McNulty ***@***.***> wrote:
Your motivations do make sense, I wonder if it would be a better fit for
your use case if a) you made all of the parallel tasks that you would want
to rerun together into a task group and b) if there was a UI option to mark
all tasks in a task group success/failed. Your strategy of going one task
upstream and doing mark failed downstream seems like a "hack" for doing
this, correct?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#17992 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AP4PNAGAA7MIZS7WXW5NEU3UAKHIJANCNFSM5DJM4XLA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Beta Was this translation helpful? Give feedback.
-
Could this really ever be solved? you are looking for a way to force your way over someone else that is choosing to ignore your may I also point that someone can set your DAG to OFF during execution which results in the last My take on such issues is that - when one manually changes stuff one must be aware what he does and how it effect his process. You can use |
Beta Was this translation helpful? Give feedback.
-
I think that #18097 basically supersedes this and is a better idea. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Description
I have a cleanup task that I want to run at the end of a DAG no matter what. To attempt to do this, I am putting it at the end with
trigger_rule='all_done'
like so:However, I think that this might not be completely bulletproof. Say upstream there are some running tasks and one of them fails but others are still running, and the failure was because of a bug in code. Somebody might decide to halt all job execution so they can fix their code. They might do that by clicking a task and clicking
Mark Failed
withDownstream
selected. Based on testing, this will cause myalways_run
task to not run, because it gets marked as failed.In a sense the spirit of
all_done
gets violated here.In practice I think this can be a pretty big limitation, because the user behavior mentioned above may be common. People do it because it's an easy way to stop the job in the middle, fix some stuff, and restart the job by clearing statuses recursively later. That's fine until you consider that there are still some cleanup tasks that you might want to run in this situation, like if you spun up some VMs that should be taken down, etc. In my case I am reserving Flex Slots capacity in BigQuery and it's important to always take it down because it is very expensive.
It seems like there are two possible solutions:
Of course, I think what this is teaching me is that the recursive downstream markings can be dangerous, and maybe our team just needs to adopt a practice of marking only individual tasks as failed one-by-one. If you do that, there's no problem.
Use case/motivation
See above
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions