fusemblr is a pipeline wrapper designed for the assembly of complex genomes using nanopore reads and paired-end illumina
fusemblr was designed for the Fusarium oxysporum assembly project (hence the name)
The pipeline uses Nanopore (the longer and higher coverage the better) and paired-end illumina reads (PacBio is optional)
Notably: Providing PacBio Hifi had very little impact on the resulting assemblies using our Fusarium oxysporum datasets as we used recent ONT basecalled data, had high coverage and a good subset of long reads.
conda install samtobam::fusemblr
fusemblr.sh -n nanopore.fq.gz -1 illumina.R1.fq.gz -2 illumina.R2.fq.gz -g 70000000
Required inputs:
-n | --nanopore Nanopore long reads used for assembly in fastq or fasta format (*.fastq / *.fq) and can be gzipped (*.gz)
-1 | --pair1 Paired end illumina reads in fastq format; first pair. Used for Rataosk polishing and PAQman evaluation. Can be gzipped (*.gz)
-2 | --pair2 Paired end illumina reads in fastq format; second pair. Used for Rataosk polishing and PAQman evaluation. Can be gzipped (*.gz)
-g | --genomesize Estimation of genome size, required for downsampling and assembly
Recommended inputs:
-h | --hifi Pacbio HiFi reads required for assembly polishing with NextPolish2 (Recommended if available)
-t | --threads Number of threads for tools that accept this option (default: 1)
Optional parameters:
-m | --minsize Minimum size of reads to keep during downsampling (Default: 5000)
-x | --coverage The amount of coverage for downsampling (X), based on genome size, i.e. coverage*genomesize (Default: 100)
-v | --minovl Minimum overlap for Flye assembly, (Default: Calculated during run as N95 of reads used for assembly)
-w | --weight The weighting used by Filtlong for selecting reads; balancing the length vs the quality (Default: 5)
-p | --prefix Prefix for output (default: name of assembly file (-a) before the fasta suffix)
-o | --output Name of output folder for all results (default: fusemblr_output)
-c | --cleanup Remove a large number of files produced by each of the tools that can take up a lot of space. Choose between 'yes' or 'no' (default: 'yes')
-h | --help Print this help message
2. Polishing of downsampled reads with the paired-end illumina reads using Meryl
and Ratatosk correct
-uses a baseline quality score (-Q) of 90 and therefore assumes mildly recent ONT data (e.g. R10 or high-accuracy basecalling)
Following assembly it is recommended that you run PAQman on your resulting assembly to comprehensively check the quality
This can also help you compare any assemblies you have to check for the best.