-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
add kwarg to handle invalid files in open_mfdataset #9955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
I'm not the expert, but this looks reasonable! Any other thoughts? Assuming no one thinks it's a bad idea, we would need tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea.
But the way it is implemented here seems overly complicated and repetitive.
I would suggest to revert the logic: first build up the list wrapped in a single try
and then handle the three cases in the except
block.
Co-authored-by: Michael Niklas <mick.niklas@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there.
Also, we should add tests for this.
@headtr1ck Thanks for the suggestions. I have added two tests ( |
Hi @headtr1ck, I have been thinking about the handling of |
@max-sixty Can you please go through the PR. Thanks! |
I'm admittedly much less familiar with this section of the code. nothing seems wrong though! I think we should bias towards merging, so if no one has concerns then I'd vote to merge could we fix the errors in the docs? |
It seems like one test failed |
for more information, see https://pre-commit.ci
@headtr1ck Would you mind having one last look here? I'm not able to merge this without your interaction wrt to a requested change. Thanks! |
I'll close and reopen, just to check how this affects the merge rules. |
Sorry for the late reply, currently on holidays. I am not sure about this approach. For 1D arrays it works fine (even though maybe users want to have the options to have NaNs in positions of missing values, but due to xarrays indexing this should not be an issue). But for ND nested lists, basically this PR only works if the user supplies an additional file in the place of the broken file. This implies that the user actually knows that one file is broken. So I wonder if this is useful at all? I think the main use case is the following: But maybe this is supposed to be an additional PR? |
Thanks @headtr1ck, I've had similar concerns here: #9955 (comment). @pratiman-91's argument was, if the provided nested list would lead to a consistent output (after removal of any breaking file) it should return without erroring out. It's hard to imagine a real life use case, when and how this could happen. @pratiman-91 can you please elaborate your use case here?
I'd totally support this approach, although this seems not an easy task. But should go in a follow-up PR. From my side, this is PR is still a first way forward, giving the user a bit more freedom when opening/combining nested lists of files. |
Thanks, @headtr1ck and @kmuehlbauer.
In this case, two things will happen:
Filling with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes are still required.
Another question: what happens now if someone passes a e.g. 2x2 list of files where one is broken?
Because as far as I can tell, if errors="ignore"
this file will be silently removed but then later on the dataset cannot be constructed and quite likely will throw an error that will confuse the user.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I have made changes based on your suggestions.
I agree, that would be the case. An important assumption is that removing the files does not affect the overall validity of the datasets. I think it should be up to the user to use that option. |
whats-new.rst
Added new argument in
open_mfdataset
to better handle the invalid files.