preqclr is a software tool that reports on quality for long read sequencing data without the use of a reference genome. With the emergence of new long read sequencing technology such as Pacbio Single Molecule, Real-Time (SMRT) Sequencing technology and Oxford Nanopore Technologies (ONT), there is a need for a method that assesses sequencing quality prior to analyses. This tool enables users to visualize metrics of quality.
There are two components to preqclr:
1. calculate
2. report
preqclr generates a PDF report with the following plots:
- Estimated genome size
- Read length distribution
- Estimated coverage distribution
- Per read GC content distribution
- Estimated coverage vs read length
- Total number of bases as a function of minimum read length
- NG(X)
To create the files needed:
- minimap2 (for overlap detection)
- miniasm (optional for NGX plots)
For the calculation step:
- C++ compiler with C++11 support
For the report generation step:
- Python2.7.11
- matplotlib
- BioPython
- setuptools (to download report script dependencies)
To install from source:
git clone --recursive https://github.com/simpsonlab/preqclr.git
cd preqc-lr
make
# download report script dependencies
# create virtual environment
virtualenv preqclr-venv
source preqclr-venv/bin/activate
python setup.py install# STEP 0: overlap detection and contig lengths
    minimap2 -x ava-ont reads.fq reads.fq > overlaps.paf
    miniasm -f reads.fq overlaps.paf > layout.gfa
# STEP 1: calculate data for plots
    ./preqclr -r reads.fq \
              --paf overlaps.paf \
              --gfa layout.gfa \
              -n ecoli_sample.pacbio \
              --verbose
# STEP 2: create a PDF report
python preqclr-report.py -i ecoli_sample.pacbio.preqclr --verbose - When using minimaps, we recommend using the settings optimized for PacBio reads (-x ava-pb) and ONT reads (-x ava-ont).
preqclr uses overlap and assembly information from minimap2 and miniasm, respectively. To parse the output PAF file, and efficiently read fasta files we used kseq and PAF parser in miniasm. I would like to thank Heng Li for developing these tools.



