Wastewater-Based Epidemiology using Phylogenetic Placements

Introduction

WEPP (Wastewater-Based Epidemiology using Phylogenetic Placements) is a phylogeny-based pipeline that estimates haplotype proportions from wastewater sequencing reads using a mutation-annotated tree (MAT) (Figure 1A). By improving the resolution of pathogen variant detection, WEPP enables critical epidemiological applications previously feasible only through clinical sequencing. It also flags potential novel variants via Unaccounted Mutations, which can be examined at the read level using the interactive dashboard (Figure 1B).

WEPP begins by placing reads on the mutation-annotated tree (MAT) and identifying an initial set of candidate haplotypes. It expands this set by including neighbors around each selected haplotype to form a candidate pool, which is passed to a deconvolution algorithm to estimate haplotype abundances. Haplotypes above a frequency threshold are retained, and their neighbors are again added to form a new candidate pool. This process is repeated iteratively until the haplotype set stabilizes or the maximum number of iterations is reached (Figure 1C).

Figure 1: Overview of WEPP

Installation

WEPP offers multiple installation methods. Using a Docker is recommended to prevent any conflict with existing packages.

Docker image from DockerHub
Dockerfile
Shell Commands

⚠️ The Docker image is currently built for the linux/amd64 platform. While it can run on arm64 systems (e.g., Apple Silicon or Linux aarch64) via emulation, this may lead to reduced performance.

Option-1: Install via DockerHub

The Docker image includes all dependencies required to run WEPP.

Step 1: Get the image from DockerHub

docker pull pranavgangwar/wepp:latest

Step 2: Start and run Docker container

# Use this command if your datasets can be downloaded from the Web
docker run -it pranavgangwar/wepp:latest

# Use this command if your datasets are present in your current directory
docker run -it -v "$PWD":/WEPP -w /WEPP pranavgangwar/wepp:latest

Step 3: Confirm proper working by running

snakemake test --cores 1 --use-conda

Option-2: Install via Dockerfile

The Dockerfile contains all dependencies required to run WEPP.

Step 1: Clone the repository

git clone --recurse-submodules https://github.com/TurakhiaLab/WEPP.git 
cd WEPP

Step 2: Build a Docker Image

cd docker
docker build -t wepp . 
cd ..

Step 3: Start and run Docker container

# Use this command if your datasets can be downloaded from the Web
docker run -it wepp

# Use this command if your datasets are present in your current directory
docker run -it -v "$PWD":/workspace -w /workspace wepp

Option-3: Install via Shell Commands (requires sudo access)

Users without sudo access are advised to install WEPP via Docker Image.

Step 1: Clone the repository

git clone --recurse-submodules https://github.com/TurakhiaLab/WEPP.git
cd WEPP

Step 2: Install dependencies (might require sudo access) WEPP depends on the following common system libraries, which are typically pre-installed on most development environments:

- wget
- curl
- pip
- build-essential 
- python3-pandas
- pkg-config
- zip
- cmake 
- libtbb-dev
- libprotobuf-dev
- protobuf-compiler
- snakemake
- conda

For Ubuntu users with sudo access, if any of the required libraries are missing, you can install them with:

sudo apt-get install -y wget pip curl python3-pip build-essential python3-pandas pkg-config zip cmake libtbb-dev libprotobuf-dev protobuf-compiler snakemake

If your system doesn't have Conda, you can install it with:

wget -O Miniforge3.sh "https://github.com/conda-forge/miniforge/releases/download/24.11.3-2/Miniforge3-24.11.3-2-Linux-x86_64.sh"
bash Miniforge3.sh -b -p "${HOME}/conda"

source "${HOME}/conda/etc/profile.d/conda.sh"
source "${HOME}/conda/etc/profile.d/mamba.sh"

Quick Start

The following steps will download a real wastewater RSVA dataset and analyze it with WEPP.

Step 1: Download the test dataset

mkdir -p data/RSVA_real
cd data/RSVA_real
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR147/011/ERR14763711/ERR14763711_*.fastq.gz https://hgdownload.gi.ucsc.edu/hubs/GCF/002/815/475/GCF_002815475.1/UShER_RSV-A/2025/04/25/rsvA.2025-04-25.pb.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/815/475/GCF_002815475.1_ASM281547v1/GCF_002815475.1_ASM281547v1_genomic.fna.gz
gunzip GCF_002815475.1_ASM281547v1_genomic.fna.gz 
mv ERR14763711_1.fastq.gz ERR14763711_R1.fastq.gz
mv ERR14763711_2.fastq.gz ERR14763711_R2.fastq.gz
cd ../../

This will save the datasets on a separate data/RSVA_real folder within the repository.

Step 2: Run the pipeline

snakemake --config DIR=RSVA_real FILE_PREFIX=test_run PRIMER_BED=RSVA_all_primers_best_hits.bed TREE=rsvA.2025-04-25.pb.gz REF=GCF_002815475.1_ASM281547v1_genomic.fna CLADE_IDX=0 --cores 32 --use-conda

Step 3: Analyze Results

All results generated by WEPP can be found in the results/RSVA_real directory.

User Guide

Organizing Data

We assume that all wastewater samples are organized in the data directory, each within its own subdirectory given by DIR argument (see Run Command). For each sample, WEPP generates intermediate and output files in corresponding subdirectories under intermediate and result, respectively. Each created DIR inside data is expected to contain the following files:

Sequencing Reads: Ending with *R{1/2}.fastq.gz for paired-ended reads and *.fastq.gz for single-ended.
Reference Genome fasta
Mutation-Annotated Tree (MAT)
[OPTIONAL] Genome Masking File: mask.bed, whose third column specifies sites to be excluded from analysis.

Visualization of WEPP's workflow directories

📁 WEPP
└───📁data                                # [User Created] Contains data to analyze 
    ├───📁SARS-CoV-2_test_1               # SARS-CoV-2 run wastewater samples
         ├───sars_cov_2_reads.fastq.gz    # Single-ended reads 
         ├───sars_cov_2_reference.fa
         ├───mask.bed                     # OPTIONAL 
         └───sars_cov_2_mat.pb.gz
    ├────📁RSVA_test_1                    # RSVA run wastewater samples 
         ├───rsva_reads_R1.fastq.gz       # Paired-ended reads
         ├───rsva_reads_R2.fastq.gz       # Paired-ended reads
         ├───rsva_reference.fa 
         └───rsva_mat.pb.gz

└───📁intermediate                        # [WEPP Generated] Contains intermediate stage files 
    ├───📁SARS-CoV-2_test_1                
         ├───file_1
         └───file_2
    ├────📁RSVA_test_1                      
         ├───file_1
         └───file_2

└───📁results                             # [WEPP Generated] Contains final WEPP results
    ├───📁SARS-CoV-2_test_1                
         ├───file_1
         └───file_2
    ├────📁RSVA_test_1                      
         ├───file_1
         └───file_2

WEPP Arguments

The WEPP Snakemake pipeline requires the following arguments, which can be provided either via the configuration file (config/config.yaml) or passed directly on the command line using the --config argument. The command line arguments take precedence over the config file.

DIR - Folder name containing the wastewater reads
FILE_PREFIX - File Prefix for all intermediate files
REF - Reference Genome in fasta
TREE - Mutation-Annotated Tree
SEQUENCING_TYPE - Sequencing read type (s:Illumina single-ended, d:Illumina double-ended, or n:ONT long reads)
PRIMER_BED - BED file for primers from the primers folder
MIN_AF - Alleles with an allele frequency below this threshold in the reads will be masked.
MIN_Q - Alleles with a Phred score below this threshold in the reads will be masked.
MAX_READS - Maximum number of reads considered by WEPP from the sample. Helpful for reducing runtime
CLADE_IDX - Index used for assigning clades to selected haplotypes from MAT. Generally '1' for SARS-CoV-2 MATs and '0' for others. Could be checked by running: "matUtils summary -i {TREE} -C {FILENAME}" -> Use '0' for annotation_1 and '1' for annotation_2

Run Command

WEPP's snakemake workflow requires DIR and FILE_PREFIX as config arguments through the command line, while the remaining ones can be taken from the config file. It also requires --cores from the command line, which specifies the number of threads used by the workflow.

Examples:

Using all the parameters from the config file

snakemake --config DIR=SARS-CoV-2_test_1 FILE_PREFIX=test_run --cores 32 --use-conda

Overriding MIN_Q and PRIMER_BED through command line

snakemake --config DIR=RSVA_test_1 FILE_PREFIX=test_run MIN_Q=25 PRIMER_BED=none.bed --cores 32 --use-conda

Contributions

We welcome contributions from the community to enhance the capabilities of WEPP. If you encounter any issues or have suggestions for improvement, please open an issue on WEPP GitHub page. For general inquiries and support, reach out to our team.

Citing WEPP

TBA.

Name		Name	Last commit message	Last commit date
Latest commit History 797 Commits
.github/workflows		.github/workflows
.vscode		.vscode
config		config
docker		docker
docs		docs
primers		primers
src		src
workflow		workflow
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
mask.bed		mask.bed
mkdocs.yml		mkdocs.yml
parsimony.proto		parsimony.proto
sam.proto		sam.proto

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wastewater-Based Epidemiology using Phylogenetic Placements

Table of Contents

Introduction

Installation

Option-1: Install via DockerHub

Option-2: Install via Dockerfile

Option-3: Install via Shell Commands (requires sudo access)

Quick Start

User Guide

Organizing Data

WEPP Arguments

Run Command

Contributions

Citing WEPP

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

TurakhiaLab/WEPP

Folders and files

Latest commit

History

Repository files navigation

Wastewater-Based Epidemiology using Phylogenetic Placements

Table of Contents

Introduction

Installation

Option-1: Install via DockerHub

Option-2: Install via Dockerfile

Option-3: Install via Shell Commands (requires sudo access)

Quick Start

User Guide

Organizing Data

WEPP Arguments

Run Command

Contributions

Citing WEPP

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages