-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Short-term fix: Use Hub archives as a fallback for historical metadata in get_target_data.py
Problem
Our post-submission workflow https://github.com/reichlab/variant-nowcast-hub/actions/runs/18546608414 when generating target data. Nextstrain deleted historical metadata_version.json files from S3 on October 15 (cost savings), breaking CladeTime's ability to access reference tree metadata for dates before September 28, 2025.
Failed job: create-target-data (2025-08-06) — CladeTime received 404 when requesting metadata for tree_as_of="2025-08-04"
Affected rounds: All 14 rounds in the workflow matrix (July 16 - October 15, 2025) are blocked.
Solution
Modify src/get_target_data.py to read metadata directly from our own auxiliary-data/modeled-clades/*.json archives when CladeTime would fail.
Implementation Approach
- Add helper function to read ncov metadata from
auxiliary-data/modeled-clades/{date}.json - Wrap CladeTime initialization to catch
TreeNotAvailableError - Fall back to Hub archives for dates before September 28, 2025 (perhaps just the couple months of concern)
- May need to investigate CladeTime's API for providing external metadata, or consider temporarily patching CladeTime
locally
Testing / Definition of Done
Run workflow manually with nowcast_date=2025-10-15 to verify all 14 rounds (July 16 - October 15) generate target data
successfully.
Related
- Long-term fix will be in CladeTime itself (separate issue in reichlab/cladetime)
- Our archives contain all required fields:
nextclade_dataset_name_full,nextclade_dataset_version,nextclade_version_num - Run post-submission jobs failing for dates before 2025-09-28 #775