Skip to content

Conversation

joeyschultz
Copy link
Contributor

@joeyschultz joeyschultz commented Oct 13, 2025

Description

This PR adds the ability to exclude variables from the output. Configuration rules can be defined in the ExcludedScienceVariables section of the earthdata-varinfo configuration file, and the metadata-annotator will exclude any variables matching these rules from the output.

Configuration rules were added to specifically exclude SMAP L3 string variables.

Jira Issue ID

DAS-2418

Local Test Steps

Check out this branch and build the new image and run the tests

./bin/build-image && ./bin/build-test && ./bin/run-test

Ensure your HIAB configuration has:

LOCALLY_DEPLOYED_SERVICES=harmony-metadata-annotator,hoss,sds-maskfill

Restart harmony to pick up the new image:

./bin/bootstrap-harmony

Run the DAS-2418.ipynb notebook attached to this ticket. Inspect the downloaded outputs for each collection and ensure the following variables have been excluded:

  • SPL3SMA:
    • /Soil_Moisture_Retrieval_Data/spacecraft_overpass_time_utc
  • SPL3SMAP:
    • /Soil_Moisture_Retrieval_Data/spacecraft_overpass_time_utc
  • SPL3FTA :
    • /Freeze_Thaw_Retrieval_Data/freeze_thaw_time_utc
    • /Freeze_Thaw_Retrieval_Data/thaw_reference_date
    • /Freeze_Thaw_Retrieval_Data/freeze_reference_date
  • SPL3FTP:
    • /Freeze_Thaw_Retrieval_Data_Global/freeze_thaw_time_utc
    • /Freeze_Thaw_Retrieval_Data_Polar/freeze_thaw_time_utc
  • SPL3FTP_E:
    • /Freeze_Thaw_Retrieval_Data_Global/freeze_thaw_time_utc
    • /Freeze_Thaw_Retrieval_Data_Polar/freeze_thaw_time_utc
  • SPL3SMP:
    • /Soil_Moisture_Retrieval_Data_AM/tb_time_utc
    • /Soil_Moisture_Retrieval_Data_PM/tb_time_utc_pm
  • SPL3SMP_E:
    • /Soil_Moisture_Retrieval_Data_AM/tb_time_utc
    • /Soil_Moisture_Retrieval_Data_PM/tb_time_utc_pm
    • /Soil_Moisture_Retrieval_Data_Polar_AM/tb_time_utc
    • /Soil_Moisture_Retrieval_Data_Polar_PM/tb_time_utc_pm

To ensure no regressions, run the applicable das-harmony-regression-tests notebooks. Make sure to set harmony_host_url = 'http://localhost:3000/'.

  • Metadata_Annotator.ipynb
  • HOSS_SPL2SMAP_S_tests.ipynb
  • HOSS_SPL3FTA_tests.ipynb
  • HOSS_SPL3FTP_E_tests.ipynb
  • HOSS_SPL3FTP_tests.ipynb
  • HOSS_SPL3SMA_tests.ipynb
  • HOSS_SPL3SMAP_tests.ipynb
  • HOSS_SPL3SMP_E_tests.ipynb
  • HOSS_SPL3SMP_tests.ipynb

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • CHANGELOG.md updated to include high level summary of PR changes.
  • docker/service_version.txt updated if publishing a release.
  • Fix version added to Jira ticket if publishing a release.
  • Tests added/updated and passing.
  • Documentation updated (if needed).

@joeyschultz joeyschultz requested a review from a team October 13, 2025 18:12
Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Nice work. I had a couple of minor nits.

Comment on lines +540 to +545
def is_excluded_science_variable(var_info: VarInfoFromNetCDF4, var) -> bool:
"""Returns True if variable is explicitly excluded by VarInfo configuration."""
exclusions_pattern = re.compile(
'|'.join(var_info.cf_config.excluded_science_variables)
)
return var_info.variable_is_excluded(var, exclusions_pattern)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is probably the time to move this to earthdata-varinfo. Because I'll bet Josie has a similar function in her PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly do.

expected_output_netcdf4_file_test05, decode_times=False
) as expected_datatree,
xr.open_datatree(temp_output_file_path, decode_times=False) as results_datatree,
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time I really hate black's formatting. 😬

sample_datatree = xr.DataTree(
xr.Dataset(
data_vars={
'string_time_utc_seconds': xr.DataArray(np.ones((3, 3))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing in your PR that explicitly ignores strings correct?

Separately, could this be a string variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, there is no explicit exclusion of strings.

I can see if I can make them strings to be more aligned with the variable name, but as I said, there is no specific handling for strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69d3374. Let me know if this is what you were looking for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just something with an actual string var. Thanks

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I verified all of the test steps. Files without Strings and the regression tests. I know there is a change coming for the regression tests due to a phony_dim_name numbering difference. But this can merge.

Copy link

@lyonthefrog lyonthefrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the unit and regression tests, and they all appear to be passing! Added a couple little questions, but once those are answered I'll approve

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commenting on this file as a whole - I never thought about creating a test config file like this, kinda cool! I suppose I wonder if it might be an issue that we're not directly testing the changes to earthdata_varinfo_config.json, but if this is an accepted practice I'll accept it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think both approaches are okay, the earthdata_varinfo_test_config.json file includes configuration items for things like deleting attributes that don't currently exist in the earthdata_varinfo_config.json. I think using the test config file is beneficial for our unit tests as it won't change as desired collection specific output annotations evolve (ex. unit test using a configuration item that is later removed).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay got it! And that makes sense especially for the metadata annotator.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you think it'd be worthwhile to test the specific SPL3FTA configuration, or nah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precedent set in this repository has been to avoid tests for specific collection configuration rules. I think this is mainly because it would inflate the test suite (would require tests for every collection that is configured) and we'd also need to store granules for these tests in the repository.

We've been reliant on properly testing these collection specific configuration rules in the test instructions for the PRs that add them.

@lyonthefrog lyonthefrog self-requested a review October 15, 2025 20:15
@joeyschultz joeyschultz reopened this Oct 15, 2025
@joeyschultz joeyschultz merged commit 7ec3b33 into main Oct 15, 2025
5 checks passed
@joeyschultz joeyschultz deleted the DAS-2418 branch October 15, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants