nf-core/dms is a reproducible, scalable, and community-curated pipeline for analyzing deep mutational scanning (DMS) data using shotgun DNA sequencing. DMS enables researchers to measure the fitness effects of thousands of gene variants simultaneously, helping to classify disease causing mutants in human and animal populations, to learn fundamental rules of virus evolution, protein architecture, splicing or small-molecule interactions.
While DNA synthesis and sequencing technologies have advanced substantially, long open reading frame (ORF) targets still present major challenges for DMS studies. Shotgun DNA sequencing can be used to greatly speed up the inference of long ORF mutant fitness landscapes, theoretically at no expense in accuracy. We have designed the nf-core/dms pipeline to unlock the power of shotgun sequencing based DMS studies, to simplify and standardise the complex bioinformatics steps involved in data processing of such experiments – from read alignment to QC reporting and fitness landscape inferences.
📄 Reference: Wehnert et al., bioRxiv preprint (coming soon)
- End-to-end analyses of DMS shotgun sequencing data
- Modular, three-stage workflow: alignment → QC → error-aware fitness estimation
- Integrates with popular statistical tools like DiMSum, Enrich2, Rosace and mutscan
- Supports multiple mutagenesis strategies, e.g. nicking by NNK and NNS codons
- Containerized via Docker, Singularity and Apptainer
- Scalable across HPC and Cloud systems
- Monitors CPU, memory, and CO₂ usage
nf-core/dms uses Nextflow, which must be installed on your system:
java -version # Check that Java v11+ is installed
curl -s https://get.nextflow.io | bash # Download Nextflow
chmod +x nextflow # Make executable
mv nextflow ~/bin/ # Add to user's $PATH
The pipeline itself requires no installation – Nextflow will fetch it directly from GitHub:
nextflow run nf-core/dms -profile docker
Prepare:
- A sample sheet CSV to specify input/output labels, replicates, etc. (see example)
- A reference FASTA file for the gene or region of interest
To execute nf-core/dms, run the basic command:
nextflow run nf-core/dms \
-profile singularity,local \
--input ./input.csv \
--outdir ./results \
--fasta ./ref.fa \
--reading-frame 1-300 \
--mutagenesis NNK-NNS \
--seq-rarefaction false
Parameter | Description |
---|---|
--input |
Path to sample sheet CSV |
--outdir |
Path to output directory |
--fasta |
Reference FASTA file |
--reading_frame |
Start and end nucleotide (e.g. 1-300 ) |
Parameter | Default | Description |
---|---|---|
--read-align |
bwa-mem |
Read aligner |
--mutagenesis |
NNK-NNS |
Deep mutational scanning strategy used |
--seq-rarefaction |
false |
Estimate sequencing saturation by rarefaction |
--error-estimation |
input |
Error model used to correct 1nt counts |
--fitness-estimation |
dimsum |
Downstream fitness inference module |
More options and advanced configuration: see vignette
The primary pipeline input is a sample sheet .csv
file listing:
- Paths to paired-end
.fastq.gz
files from shotgun sequencing - Their classification as either input or output samples
- Replicate IDs
- Associated experimental metadata
See sample CSV for formatting.
After execution, the pipeline creates the following directory structure:
results/
├── plots/ # PDF visualizations: coverage, variant heatmaps, etc.
├── intermediate_files/ # Raw alignments, filtered variant tables, QC reports
├── final_files/ # Fitness and error tables from downstream tools
├── timeline.html # Runtime timeline
└── report.html # Summary report incl. resource and CO₂ usage
If you use this pipeline in your research, please cite:
📄 Wehnert et al., bioRxiv preprint (coming soon)
Please also cite the nf-core framework:
📄 Ewels et al., Nature Biotechnology, 2020
https://doi.org/10.1038/s41587-020-0439-x
© 2025 Benjamin Wehnert, Taylor Mighell, Fei Sang, Ben Lehner, Maximilian Stammnitz
We welcome contributions from the community!
Please open an issue or pull request via this GitHub page, to:
- Suggest or help implementing new modules for custom workflows
- Report bugs and other challenges in running nf-core/dms
- Help improve this documentation
You can also reach out to us via the nf-core Slack, by use of the #dms
channel (join here).
For detailled scientific or technical questions, feedback and experimental discussions, feel free to contact us directly:
- Benjamin Wehnert — wehnertbenjamin@gmail.com
- Taylor Mighell — taylor.mighell@crg.eu
- Fei Sang — fs18@sanger.ac.uk
- Ben Lehner — bl11@sanger.ac.uk
- Maximilian Stammnitz — maximilian.stammnitz@crg.eu