Skip to content

Update and standardize Nextclade datasets for all lineages and gene segments #186

@huddlej

Description

@huddlej

Context

Only H3N2 and H1N1pdm currently have Nextclade datasets for all 8 gene segments, while B/Vic has HA and NA datasets and B/Yam only has HA. The default HA and NA datasets for H3N2 and H1N1pdm use more modern reference strains (e.g., A/Darwin/6/2021 for H3N2), while the other segments use older reference strains (e.g., A/NewYork/392/2004 for all other H3N2 genes).

Only HA and NA datasets include shortcut aliases like flu_h3n2_ha while other gene segments have a single longer name like nextstrain/flu/h3n2/ns.

Description

We should update the Nextclade datasets for all lineages and genes to use the same modern reference strains and provide the same shortcut aliases.

This means getting additional gene sequences and coordinates for A/Darwin/6/2021 (H3N2), A/Wisconsin/588/2019 (H1N1pdm), and B/Brisbane/60/2008 (B/Vic). For completeness, we could add all remaining genes for B/Wisconsin/01/2010 (B/Yam). We wouldn't use the Yam dataset for surveillance analyses, but it could be helpful for historical analyses.

If we wanted to update the references to newer strains for all subtypes, this could be a good time to make that change, too.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions