A Snakemake workflow to perform basecalling and demultiplexing of Oxford Nanopore ONT data using Dorado.
The usage of this workflow is described in the Snakemake Workflow Catalog.
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository.
This workflow uses Oxford Nanopore's basecaller dorado for basecalling and demultiplexing Oxford Nanopore (ONT) data. Instead of running dorado
as a single job that uses all pod5
files as input, basecalling is performed on each single pod5
file separately, resulting in a single job per pod5
file. The basecalled bam files are then demultiplexed and a summary report will be provided.
The workflow is built using snakemake and consists of the following steps:
- Parse
runs.csv
table containing the run's meta data (python
) - Download the model for base calling as defined in the runs table
- Call bases using dorado in simplex mode on each
pod5
file separately (dorado basecaller
) - Demultiplex ONT data (
dorado demux
) - Aggregate
.fastq
files based on barcode and compress (bgzip
) - Summarize basecalling information (
dorado summary
) - Collect QC metrics and generate reports (
pycoQC
,NanoPlot
)
- Dorado (0.8+ tested). It can be downloaded and installed from https://github.com/nanoporetech/dorado.
Step 1: Clone this repository
git clone https://github.com/MPUSP/snakemake-ont-basecalling.git
cd snakemake-ont-basecalling
Step 2: Install dependencies
It is recommended to install snakemake and run the workflow with conda
or mamba
. Miniforge is the preferred conda-forge installer and includes conda
, mamba
and their dependencies.
Step 3: Create snakemake environment
This step creates a new conda environment called snakemake-ont-basecalling
.
# create new environment with dependencies & activate it
mamba create -c conda-forge -c bioconda -n snakemake-ont-basecalling snakemake>=8.24.1 snakemake-executor-plugin-slurm pandas python=3.12
conda activate snakemake-ont-basecalling
Note:
All other dependencies for the workflow are automatically pulled as conda
environments by snakemake, when running the workflow with the --sdm conda
parameter (recommended).
Step 4: Install Dorado
- Dorado can be downloaded and installed locally from https://github.com/nanoporetech/dorado.
- Define the path to the dorado binary in the
config
file
Step 5: Create all rule specific environments (optional)
This step creates all conda environments specified in the snakemake rules. This step is optional.
# activate new environment
conda activate snakemake-ont-basecalling
snakemake -c 1 --sdm conda --conda-create-envs-only --conda-cleanup-pkgs cache --directory .test
This workflow requires pod5
input data. These input files are supplied to the workflow using a mandatory runs table linked in the config.yml
file (default: .test/config/runs.csv
). Each row in the runs table corresponds to a single run, for which all pod5
files are provided via a data_folder
column. Multiple runs can be defined in the table.
The runs table has the following layout:
run_id | data_folder | basecalling_model | barcode_kit |
---|---|---|---|
MK1C_run_01 | ".test/data" | dna_r10.4.1_e8.2_400bps_sup@v5.0.0 | SQK-PCB114-24 |
To define rule specific resources like gpu usage, configuration profiles will be used. See snakemake docs on profiles for more information. A default profile for local testing and a slurm specific cluster profile is provided with this workflow.
To run the workflow from command line, change to the working directory and activate the conda environment.
cd snakemake-ont-basecalling
conda activate snakemake-ont-basecalling
Adjust options in the default config file config/config.yml
. Before running the entire workflow, you can perform a dry run using:
snakemake --cores 3 --sdm conda --directory .test --dry-run
To run the complete workflow with test files using conda, execute the following command.
snakemake --cores 3 --sdm conda --directory .test
To run the complete workflow with test files on a slurm cluster, adjust the slurm cluster specific config.yaml
file and execute the following command.
snakemake --sdm conda --workflow-profile workflow/profiles/slurm/ --directory .test
Note: It is recommended to start the snakemake pipeline on the cluster using a session multiplexer like screen or tmux.
- Dr. Rina Ahmed-Begrich
- Affiliation: Max-Planck-Unit for the Science of Pathogens (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-0656-1795
- Dr. Michael Jahn
- Affiliation: Max-Planck-Unit for the Science of Pathogens (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-3913-153X
- github page: https://github.com/m-jahn
Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. Sustainable data analysis with Snakemake. F1000Research, 10:33, 10, 33, 2021. https://doi.org/10.12688/f1000research.29032.2.