This repository provides a complete ingest and analysis pipeline for Human coronavirus OC43 (HCoV-OC43), built using Snakemake and based on the Nextstrain framework.
It includes tools to download and process OC43 sequences and metadata, generate Nextstrain-compatible inputs, and perform genomic analysis and visualization.
The data for this analysis is available from NCBI Virus. Instructions for downloading sequences are provided under Sequences.
Ensure the following are installed:
- Python=3.8 or higher
- Micromamba or Conda
- Snakemake=7
- Nextstrain CLI
Install the Nextstrain environment by following these instructions.
You can download OC43 sequences:
Manually from NCBI Virus
Automatically using the ingest
pipeline. Please refer to the README in the ingest
folder.
Found in config
and data
subdirectories:
config.yaml: Pipeline configuration
geo_regions.tsv, lat_longs.tsv: Geographical mappings
colors.tsv: Color palette for Nextstrain builds
clades_genome.tsv: For manually labeling clades
dropped_strains.txt: List of strains to exclude
auspice_config.json: Required for visualization
reference_sequence.gb: Reference file for OC43 (from GenBank: AY391777)
You can perform either a whole-genome or protein-specific build of OC43 using Nextstrain.
micromamba activate nextstrain
snakemake --cores 9 all
snakemake auspice/HCoV_OC43_genexy.json --cores 9
snakemake auspice/HCoV_OC43_whole_genome.json --cores 9
auspice view --datasetDir auspice
To run two visualizations simultaneously, you may need to set the port:
export PORT=4001
📬 Contact For questions or support, please contact: nosihle.msomi@swisstph.ch