Investigating the performance of Oxford Nanopore long-read sequencing with respect to Illumina microarrays and short-read sequencing

This repository contains the complete workflow and analysis scripts for benchmarking Oxford Nanopore Technologies (ONT) long-read sequencing against established platforms (Illumina short-read sequencing and microarrays). The project evaluates the performance of ONT for detecting various genetic variants across different genomic contexts and examines the impact of experimental factors such as multiplexing, sequencing depth, and read length.

Abstract

Oxford Nanopore Technologies (ONT) long-read sequencing (LRS) has emerged as a promising genomic analysis tool, yet comprehensive benchmarks with established platforms across diverse datasets remain limited. This study aimed to benchmark LRS performance against Illumina short-read sequencing (SRS) and microarrays for variant detection across different genomic contexts and to evaluate the impact of experimental factors. We sequenced 14 human genomes using the three platforms and evaluated single nucleotide variants (SNVs), insertions/deletions (indels), and structural variants (SVs) detection, stratifying by high-complexity, low-complexity, and dark genome regions while assessing effects of multiplexing, depth, and read length. LRS SNV accuracy was slightly lower than that of SRS in high-complexity regions (F-measure: 0.954 vs. 0.967) but showed comparable sensitivity in low-complexity regions. LRS showed robust performance for small (1–5 bp) indels in high-complexity regions (F-measure: 0.869), but SRS agreement decreased significantly in low-complexity regions and for larger indel sizes. Within dark regions, LRS identified more indels than SRS, but showed lower base-level accuracy. LRS identified 2.86 times more SVs than SRS, excelling at detecting large variants (>6 kb), with SV detection improving with sequencing depth. Sequencing depth strongly influenced variant calling performance, whereas multiplexing effects were minimal. Our findings provide valuable insights for optimising LRS applications in genomic research and diagnostics.

Project Structure

.
├── config/ # Workflow configuration files
├── jobs/ # Slurm job submission scripts
│ ├── benchmark/ # Benchmarking scripts
│ ├── illumina/ # Illumina processing scripts
│ ├── jupyter/ # Jupyter notebook environment setup
│ ├── ont/ # ONT processing scripts
│ └── qc/ # Quality control scripts
├── modules/ # Nextflow modules
│ ├── indel_benchmark/ # Indel analysis modules
│ ├── setup/ # Data preparation modules
│ ├── shared/ # Shared utility modules
│ ├── snv_benchmark/ # SNV analysis modules
│ └── sv_consensus/ # Structural variant consensus modules
├── references/ # Genome and positional reference files
├── workflows/ # Nextflow sub-workflows
├── main.nf # Main Nextflow workflow
├── nextflow.config # Nextflow configuration
└── ont-benchmark.ipynb # Jupyter notebook with statistical analyses
└── sample_ids.csv # ONT and Illumina sample IDs dictionary
└── seq_stats.csv # Table containing experimental records for each flowcell

Setup

Prerequisites

Nextflow Pipeline

Nextflow (24.10.2)
Docker or Singularity

Jupyter Notebook

Please see the conda environment for software requirements and dependencies

Data Requirements

The analysis pipeline expects:

Oxford Nanopore sequencing data (processed through basecalling)
Illumina short-read sequencing data (aligned and variant-called)
Illumina microarray genotyping data

NCBI API Key Setup

To ensure the pipeline functions correctly and to optimize access to NCBI resources, please set the following Nextflow secrets before running the workflow:

nextflow secrets set NCBI_API_KEY <your_ncbi_api_key>
nextflow secrets set NCBI_EMAIL <your_ncbi_email>

These secrets are necessary for accessing NCBI resources during the analysis. By default, the NCBI Datasets API and command-line tool requests are rate-limited to 5 requests per second (rps). Using an API key increases this limit to 10 rps.

For more information on obtaining and using NCBI API keys, please refer to the NCBI Datasets API Keys Documentation.

You can verify that the secrets have been set correctly by listing them:

nextflow secrets list

For more information on managing secrets in Nextflow, refer to the Nextflow Secrets documentation.

Usage

Running the Complete Workflow

nextflow run KHP-Informatics/ont-benchmark

or

sbatch jobs/benchmark/variant_benchmark.sh

Results

Analysis results are stored in the ont-benchmark jupyter notebook, organised by variant type. Each benchmark includes:

Precision, recall, and F-measure metrics
Detailed comparison between ONT, Illumina, and microarray platforms
Analysis of variant detection across different genomic contexts
Impact assessment of sequencing parameters (depth, multiplexing, read length)

License

This project is licensed under the MIT License. You can freely use and modify the code, without warranty. See LICENSE for the full license text. The authors reserve the rights to the article content, which is currently submitted for publication.

Citation

If you use this benchmark in your research, please cite: Santos, R., Lee, H., Williams, A., Baffour-Kyei, A., Lee, S.-H., Troakes, C., Al-Chalabi, A., Breen, G., & Iacoangeli, A. (2025). Investigating the Performance of Oxford Nanopore Long-Read Sequencing with Respect to Illumina Microarrays and Short-Read Sequencing. International Journal of Molecular Sciences, 26(10), 4492. https://doi.org/10.3390/ijms26104492

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Investigating the performance of Oxford Nanopore long-read sequencing with respect to Illumina microarrays and short-read sequencing

Abstract

Project Structure

Setup

Prerequisites

Nextflow Pipeline

Jupyter Notebook

Data Requirements

NCBI API Key Setup

Usage

Running the Complete Workflow

Results

License

Citation

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
config		config
jobs		jobs
modules		modules
references		references
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
ont-benchmark.ipynb		ont-benchmark.ipynb
sample_ids.csv		sample_ids.csv
seq_stats.csv		seq_stats.csv

License

KHP-Informatics/ont-benchmark

Folders and files

Latest commit

History

Repository files navigation

Investigating the performance of Oxford Nanopore long-read sequencing with respect to Illumina microarrays and short-read sequencing

Abstract

Project Structure

Setup

Prerequisites

Nextflow Pipeline

Jupyter Notebook

Data Requirements

NCBI API Key Setup

Usage

Running the Complete Workflow

Results

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages