snakemake-ont-basecalling

A Snakemake workflow to perform basecalling and demultiplexing of Oxford Nanopore ONT data using Dorado.

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository.

Workflow overview

This workflow uses Oxford Nanopore's basecaller dorado for basecalling and demultiplexing Oxford Nanopore (ONT) data. Instead of running dorado as a single job that uses all pod5 files as input, basecalling is performed on each single pod5 file separately, resulting in a single job per pod5 file. The basecalled bam files are then demultiplexed and a summary report will be provided. The workflow is built using snakemake and consists of the following steps:

Parse runs.csv table containing the run's meta data (python)
Download the model for base calling as defined in the runs table
Call bases using dorado in simplex mode on each pod5 file separately (dorado basecaller)
Demultiplex ONT data (dorado demux)
Aggregate .fastq files based on barcode and compress (bgzip)
Summarize basecalling information (dorado summary)
Collect QC metrics and generate reports (pycoQC, NanoPlot)

Requirements

Dorado (0.8+ tested). It can be downloaded and installed from https://github.com/nanoporetech/dorado.

Installation

Step 1: Clone this repository

git clone https://github.com/MPUSP/snakemake-ont-basecalling.git
cd snakemake-ont-basecalling

Step 2: Install dependencies

It is recommended to install snakemake and run the workflow with conda or mamba. Miniforge is the preferred conda-forge installer and includes conda, mamba and their dependencies.

Step 3: Create snakemake environment

This step creates a new conda environment called snakemake-ont-basecalling.

# create new environment with dependencies & activate it
mamba create -c conda-forge -c bioconda -n snakemake-ont-basecalling snakemake>=8.24.1 snakemake-executor-plugin-slurm pandas python=3.12
conda activate snakemake-ont-basecalling

Note:

All other dependencies for the workflow are automatically pulled as conda environments by snakemake, when running the workflow with the --sdm conda parameter (recommended).

Step 4: Install Dorado

Dorado can be downloaded and installed locally from https://github.com/nanoporetech/dorado.
Define the path to the dorado binary in the config file

Step 5: Create all rule specific environments (optional)

This step creates all conda environments specified in the snakemake rules. This step is optional.

# activate new environment
conda activate snakemake-ont-basecalling
snakemake -c 1 --sdm conda --conda-create-envs-only --conda-cleanup-pkgs cache --directory .test

Running the workflow

Input data

This workflow requires pod5 input data. These input files are supplied to the workflow using a mandatory runs table linked in the config.yml file (default: .test/config/runs.csv). Each row in the runs table corresponds to a single run, for which all pod5 files are provided via a data_folder column. Multiple runs can be defined in the table. The runs table has the following layout:

run_id	data_folder	basecalling_model	barcode_kit
MK1C_run_01	".test/data"	dna_r10.4.1_e8.2_400bps_sup@v5.0.0	SQK-PCB114-24

Execution

To define rule specific resources like gpu usage, configuration profiles will be used. See snakemake docs on profiles for more information. A default profile for local testing and a slurm specific cluster profile is provided with this workflow.

To run the workflow from command line, change to the working directory and activate the conda environment.

cd snakemake-ont-basecalling
conda activate snakemake-ont-basecalling

Adjust options in the default config file config/config.yml. Before running the entire workflow, you can perform a dry run using:

snakemake --cores 3 --sdm conda --directory .test --dry-run

To run the complete workflow with test files using conda, execute the following command.

snakemake --cores 3 --sdm conda --directory .test

To run the complete workflow with test files on a slurm cluster, adjust the slurm cluster specific config.yaml file and execute the following command.

snakemake --sdm conda --workflow-profile workflow/profiles/slurm/ --directory .test

Note: It is recommended to start the snakemake pipeline on the cluster using a session multiplexer like screen or tmux.

Authors

Dr. Rina Ahmed-Begrich
- Affiliation: Max-Planck-Unit for the Science of Pathogens (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-0656-1795
Dr. Michael Jahn
- Affiliation: Max-Planck-Unit for the Science of Pathogens (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-3913-153X
- github page: https://github.com/m-jahn

References

Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. Sustainable data analysis with Snakemake. F1000Research, 10:33, 10, 33, 2021. https://doi.org/10.12688/f1000research.29032.2.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
.test		.test
config		config
resources/images		resources/images
workflow		workflow
.gitignore		.gitignore
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

snakemake-ont-basecalling

Usage

Workflow overview

Requirements

Installation

Running the workflow

Input data

Execution

Authors

References

About

Uh oh!

Releases 5

Packages

Contributors 3

Uh oh!

Languages

License

MPUSP/snakemake-ont-basecalling

Folders and files

Latest commit

History

Repository files navigation

snakemake-ont-basecalling

Usage

Workflow overview

Requirements

Installation

Running the workflow

Input data

Execution

Authors

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 3

Uh oh!

Languages

Packages