Skip to content

Short-term fix: Use Hub archives as a fallback for historical metadata #803

@trobacker

Description

@trobacker

Short-term fix: Use Hub archives as a fallback for historical metadata in get_target_data.py

Problem

Our post-submission workflow https://github.com/reichlab/variant-nowcast-hub/actions/runs/18546608414 when generating target data. Nextstrain deleted historical metadata_version.json files from S3 on October 15 (cost savings), breaking CladeTime's ability to access reference tree metadata for dates before September 28, 2025.

Failed job: create-target-data (2025-08-06) — CladeTime received 404 when requesting metadata for tree_as_of="2025-08-04"

Affected rounds: All 14 rounds in the workflow matrix (July 16 - October 15, 2025) are blocked.

Solution

Modify src/get_target_data.py to read metadata directly from our own auxiliary-data/modeled-clades/*.json archives when CladeTime would fail.

Implementation Approach

  1. Add helper function to read ncov metadata from auxiliary-data/modeled-clades/{date}.json
  2. Wrap CladeTime initialization to catch TreeNotAvailableError
  3. Fall back to Hub archives for dates before September 28, 2025 (perhaps just the couple months of concern)
  4. May need to investigate CladeTime's API for providing external metadata, or consider temporarily patching CladeTime
    locally

Testing / Definition of Done

Run workflow manually with nowcast_date=2025-10-15 to verify all 14 rounds (July 16 - October 15) generate target data
successfully.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions