Automated SRA downloading, processing and Freyja analysis pipeline for SARS-CoV-2 wastewater sequencing data.
git clone https://github.com/andersen-lab/Freyja-SC2.git
cd Freyja-SC2nextflow run main.nf -entry [fetch|rerun_demix] -profile [docker|singularity] --accession_list [accession_list.csv] --output_dir [output_dir] --num_samples [num_samples]-
-entry- The pipeline entry point.-
fetchwill download, process and run Freyja on the provided SRA accessions.--sra_metadata- A CSV file containing a list of SRA accessions to download and process. The CSV file should have a header row, containing at least the following metadata fields:accession- The SRA accession number (e.g. SRR1234567).sample_status- The status of the sample (e.g. "to_run", "completed").amplicon_PCR_primer_scheme- The amplicon primer scheme used for the sample (e.g. "ARTIC V4", ARTIC V5.3.2, unknown etc.)
--num_samples- The number of samples to process per run. (default: 100)
-
rerun_demixwill run freyja demix step on previously generated variants output files in the provided variants directory. This is useful if you want to run Freyja on existing data with a different barcode set.--variants_dirmust contain files in the format[base_name].variants.tsv [base_name].depths.tsvfor each sample.
-
--output_dir- The final output directory. Createsvariantsanddemixsubdirectories containing respective output files. (default:./outputs)
-
Addtional configuration options can be found in nextflow.config
freyja-sc2 is currently in the process of downloading and processing all publicly available SARS-CoV-2 wastewater data, fetched with the following search terms:
'(Wastewater[All Fields] OR wastewater metagenome[All Fields]) AND ("Severe acute respiratory syndrome coronavirus 2"[Organism] OR SARS-CoV-2[All Fields])
In addition, to the above search terms, we exclude accessions that don't meet the following metadata requirements:
- Missing collection date
- Missing catchment size (
ww_population) - Missing location (
geo_loc_name)
To check the status of each accession, please refer to the sample_status column in data/all_metadata.tsv. All currently processed freyja outputs are publicly available via Google Cloud Storage at gs://outbreak-ww-data