Skip to content

hodcroftlab/nextclade_d68

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nextclade Setup for Enterovirus D68

Folder Structure

First, create the necessary folder structure as shown in the example workflow:

dataset/
profiles/
resources/
rules/
scripts/
results/

You can create these directories using the following command:

mkdir -p dataset profiles resources rules scripts results

Steps to Set Up The Workflow

1. Run generate_from_genbank.py

This script (located in scripts/) generates reference files from GenBank.

Run the following command:

python3 scripts/generate_from_genbank.py --reference "AY426531.1" --output-dir dataset/

During execution, you may be asked to provide CDS annotations. You can use the following codes to specify the CDS automatically:

  • [0]
  • [product] or [leave empty for manual choice] to select proteins.
  • [2].

The script will generate:

  • dataset/reference.fasta
  • dataset/genome_annotation.gff3

2. Update pathogen.json

Modify pathogen.json to:

  • Ensure file names match the generated reference files.
  • Update attributes as needed.
  • Adjust the Quality Control (QC) settings if necessary. If QC is not configured, Nextclade will not perform any checks.

For more details on configuration, refer to the Nextclade documentation.


3. Prepare reference.gb

  • Copy the reference.gb file into the resources/ directory.
  • Modify protein names as needed to match your requirements.

4. Update the Snakefile

  • Modify lines 1-18 to adjust paths and parameters.
  • Ensure all necessary files for the Augur pipeline are present, including:
    • sequences.fasta & metadata.tsv
      • can be downloaded from NCBI Virus via ingest: FETCH_SEQUENCES==True
    • auspice_config.json
  • These files are essential for building the reference tree and running Nextclade.

Ingest

The ingest process downloads sequences and metadata from NCBI Virus. For more details, refer to the Ingest Documentation.

The following packages must be installed to run the ingest process:

conda-forge/bioconda: csvtk, nextclade, tsv-utils, seqkit, zip, unzip, entrez-direct, ncbi-datasets-cli

Runnning the Snakefile

To create the auspice JSON and a Nextclade example dataset:

snakemake --cores 9 all

This runs Nextclade on the example sequences in out-dataset/sequences.fasta using the dataset in dataset. The results are saved to the test_out directory and contain alignment, aligned translations and a summary TSV file.

Visualizing the Nextclade build

One can also use the dataset in Nextclade Web by hosting the dataset through a local web server. For example, after having installed node and run npm install -g serve, one can host the dataset via:

serve --cors out-dataset -l 3000

And open Nextclade Web with a URL parameter dataset-url pointing to the local web server:

https://master.clades.nextstrain.org/?dataset-url=http://localhost:3000

Once the web page loads, you can click "Load example" and click run to test. You may want to reduce the maximum number of nucleotide markers to 500 to prevent Nextclade from freezing (click "Settings" at the top right, then select the "Sequence view" tab and reduce "Max. nucleotide markers to 500).


This guide provides a structured workflow for setting up Nextclade for Enterovirus D68. If you encounter issues, refer to the official documentation.

About

Snakemake pipeline for creating a Nextclade dataset for Enterovirus D68

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages