-
Notifications
You must be signed in to change notification settings - Fork 1
DAS-2418: Add exclusions for SMAP L3 string variables #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Nice work. I had a couple of minor nits.
def is_excluded_science_variable(var_info: VarInfoFromNetCDF4, var) -> bool: | ||
"""Returns True if variable is explicitly excluded by VarInfo configuration.""" | ||
exclusions_pattern = re.compile( | ||
'|'.join(var_info.cf_config.excluded_science_variables) | ||
) | ||
return var_info.variable_is_excluded(var, exclusions_pattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is probably the time to move this to earthdata-varinfo. Because I'll bet Josie has a similar function in her PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I certainly do.
expected_output_netcdf4_file_test05, decode_times=False | ||
) as expected_datatree, | ||
xr.open_datatree(temp_output_file_path, decode_times=False) as results_datatree, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first time I really hate black's formatting. 😬
tests/conftest.py
Outdated
sample_datatree = xr.DataTree( | ||
xr.Dataset( | ||
data_vars={ | ||
'string_time_utc_seconds': xr.DataArray(np.ones((3, 3))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing in your PR that explicitly ignores strings correct?
Separately, could this be a string variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, there is no explicit exclusion of strings.
I can see if I can make them strings to be more aligned with the variable name, but as I said, there is no specific handling for strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 69d3374. Let me know if this is what you were looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just something with an actual string var. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. I verified all of the test steps. Files without Strings and the regression tests. I know there is a change coming for the regression tests due to a phony_dim_name numbering difference. But this can merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the unit and regression tests, and they all appear to be passing! Added a couple little questions, but once those are answered I'll approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just commenting on this file as a whole - I never thought about creating a test config file like this, kinda cool! I suppose I wonder if it might be an issue that we're not directly testing the changes to earthdata_varinfo_config.json
, but if this is an accepted practice I'll accept it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think both approaches are okay, the earthdata_varinfo_test_config.json
file includes configuration items for things like deleting attributes that don't currently exist in the earthdata_varinfo_config.json
. I think using the test config file is beneficial for our unit tests as it won't change as desired collection specific output annotations evolve (ex. unit test using a configuration item that is later removed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay got it! And that makes sense especially for the metadata annotator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you think it'd be worthwhile to test the specific SPL3FTA configuration, or nah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The precedent set in this repository has been to avoid tests for specific collection configuration rules. I think this is mainly because it would inflate the test suite (would require tests for every collection that is configured) and we'd also need to store granules for these tests in the repository.
We've been reliant on properly testing these collection specific configuration rules in the test instructions for the PRs that add them.
Description
This PR adds the ability to exclude variables from the output. Configuration rules can be defined in the
ExcludedScienceVariables
section of the earthdata-varinfo configuration file, and the metadata-annotator will exclude any variables matching these rules from the output.Configuration rules were added to specifically exclude SMAP L3 string variables.
Jira Issue ID
DAS-2418
Local Test Steps
Check out this branch and build the new image and run the tests
Ensure your HIAB configuration has:
Restart harmony to pick up the new image:
Run the DAS-2418.ipynb notebook attached to this ticket. Inspect the downloaded outputs for each collection and ensure the following variables have been excluded:
To ensure no regressions, run the applicable das-harmony-regression-tests notebooks. Make sure to set harmony_host_url = 'http://localhost:3000/'.
PR Acceptance Checklist
CHANGELOG.md
updated to include high level summary of PR changes.docker/service_version.txt
updated if publishing a release.