viral_usher is a command-line tool to set up and run a pipeline to build an UShER tree for a new viral species (or type, subtype, etc.) using genomes downloaded from NCBI.
- Subcommands:
init
: Generate a config file (interactive or via command line options)build
: Download sequences and build a tree, guided by the config file
- Uses Docker for portability to laptops, servers, or cloud platforms
- Install prerequisites (if not already installed)
- Docker
- Python version 3.11 or later (we highly recommend using an environment manager such as venv, miniconda, mamba etc.)
- Install with pip (again, we highly recommended using an environment manager):
pip install viral_usher
If you want to start by just naming a virus, and let viral_usher interactively help you identify the right reference sequence, Taxonomy ID etc., then simply run
viral_usher init
and reply to the prompts.
Alternatively, if you already know your parameters, then you can skip the interactive stuff by passing in command line options. Run viral_usher --help
to get a listing of options. Here is an example that builds a tree for the Chikungunya virus using RefSeq NC_004162.2, all genomes available from GenBank for the Taxonomy ID associated with NC_004162.2 (Taxonomy ID 37124), plus additional sequences from example/hypothetical_chikungunya.fasta (in this repository):
git clone https://github.com/AngieHinrichs/viral_usher.git
cd viral_usher
viral_usher init \
--refseq NC_004162.2 \
--workdir chikungunya \
--fasta example/hypothetical_chikungunya.fasta \
--config chikungunya/config.toml
Continuing the Chikungunya virus example:
viral_usher build --config chikungunya/config.toml
That's all! viral_usher will create the following files in workdir (chikungunya
in our example):
- a tree in UShER protobuf format (optimized.pb.gz)
- a metadata file in TSV format (metadata.tsv.gz)
- a Taxonium tree file that you can view using https://taxonium.org/ (tree.jsonl.gz)
To view the example Chikungunya virus tree in Taxonium, click here. Type or copy-paste "hypothetical" into Taxonium's Name search input to find the sequences from example/hypothetical_chikungunya.fasta.
# Clone the repo
git clone https://github.com/AngieHinrichs/viral_usher.git
cd viral_usher
# Install dev dependencies
pip install -e .[dev]
# Run tests
pytest