Skip to content

MPUSP/snakemake-ont-basecalling

Repository files navigation

snakemake-ont-basecalling

Snakemake GitHub actions status run with conda workflow catalog

A Snakemake workflow to perform basecalling and demultiplexing of Oxford Nanopore ONT data using Dorado.

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository.

Workflow overview

This workflow uses Oxford Nanopore's basecaller dorado for basecalling and demultiplexing Oxford Nanopore (ONT) data. Instead of running dorado as a single job that uses all pod5 files as input, basecalling is performed on each single pod5 file separately, resulting in a single job per pod5 file. The basecalled bam files are then demultiplexed and a summary report will be provided. The workflow is built using snakemake and consists of the following steps:

  1. Parse runs.csv table containing the run's meta data (python)
  2. Download the model for base calling as defined in the runs table
  3. Call bases using dorado in simplex mode on each pod5 file separately (dorado basecaller)
  4. Demultiplex ONT data (dorado demux)
  5. Aggregate .fastq files based on barcode and compress (bgzip)
  6. Summarize basecalling information (dorado summary)
  7. Collect QC metrics and generate reports (pycoQC, NanoPlot)

Requirements

Installation

Step 1: Clone this repository

git clone https://github.com/MPUSP/snakemake-ont-basecalling.git
cd snakemake-ont-basecalling

Step 2: Install dependencies

It is recommended to install snakemake and run the workflow with conda or mamba. Miniforge is the preferred conda-forge installer and includes conda, mamba and their dependencies.

Step 3: Create snakemake environment

This step creates a new conda environment called snakemake-ont-basecalling.

# create new environment with dependencies & activate it
mamba create -c conda-forge -c bioconda -n snakemake-ont-basecalling snakemake>=8.24.1 snakemake-executor-plugin-slurm pandas python=3.12
conda activate snakemake-ont-basecalling

Note:

All other dependencies for the workflow are automatically pulled as conda environments by snakemake, when running the workflow with the --sdm conda parameter (recommended).

Step 4: Install Dorado

Step 5: Create all rule specific environments (optional)

This step creates all conda environments specified in the snakemake rules. This step is optional.

# activate new environment
conda activate snakemake-ont-basecalling
snakemake -c 1 --sdm conda --conda-create-envs-only --conda-cleanup-pkgs cache --directory .test

Running the workflow

Input data

This workflow requires pod5 input data. These input files are supplied to the workflow using a mandatory runs table linked in the config.yml file (default: .test/config/runs.csv). Each row in the runs table corresponds to a single run, for which all pod5 files are provided via a data_folder column. Multiple runs can be defined in the table. The runs table has the following layout:

run_id data_folder basecalling_model barcode_kit
MK1C_run_01 ".test/data" dna_r10.4.1_e8.2_400bps_sup@v5.0.0 SQK-PCB114-24

Execution

To define rule specific resources like gpu usage, configuration profiles will be used. See snakemake docs on profiles for more information. A default profile for local testing and a slurm specific cluster profile is provided with this workflow.

To run the workflow from command line, change to the working directory and activate the conda environment.

cd snakemake-ont-basecalling
conda activate snakemake-ont-basecalling

Adjust options in the default config file config/config.yml. Before running the entire workflow, you can perform a dry run using:

snakemake --cores 3 --sdm conda --directory .test --dry-run

To run the complete workflow with test files using conda, execute the following command.

snakemake --cores 3 --sdm conda --directory .test

To run the complete workflow with test files on a slurm cluster, adjust the slurm cluster specific config.yaml file and execute the following command.

snakemake --sdm conda --workflow-profile workflow/profiles/slurm/ --directory .test

Note: It is recommended to start the snakemake pipeline on the cluster using a session multiplexer like screen or tmux.

Authors

References

Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. Sustainable data analysis with Snakemake. F1000Research, 10:33, 10, 33, 2021. https://doi.org/10.12688/f1000research.29032.2.

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages