Disclaimer the workflow is in prototype state and configuration may change at any time
This workflow creates state-of-the-art genome hybrid assemblies for diploid vertebrate species. This version of the workflow was developed for the following scenario:
- input species: human
- successfully tested as well: muntjac
- assembler: Verkko v1.4.1
- hifiasm v0.19.x is not yet fully integrated
- inputs:
- long accurate reads: PacBio HiFi (Sequel-II/Revio)
- required coverage: at least ~40X, ideally ~60X
- long connecting reads: Oxford Nanopore ultralong (R9)
- required coverage: ~30X ultralong (>100 kbp) reads
- optional input for phasing:
- trio: kmer databases created with meryl
- HiC: HiC short reads
- graph/node coloring for Verkko's Rukki (GFA file)
- long accurate reads: PacBio HiFi (Sequel-II/Revio)
- outputs:
- main: whole-genome assembly, potentially phased
- main: basic (length) statistics about assembly and long reads
- optional: a coordinate map between the homopolymer-compressed assembly graph and the linearized plain FASTA files
The sample sheet must be a TAB-separated text file (.tsv
file extension) with at least the columns
sample
, hifi
and ont
, where both hifi
and ont
columns can hold an arbitrary number of
input file paths (comma-seperated, i.e., file_path1,file_path2,file_path3
) representing the respective
read dataset for that sample. Common file extensions are recognised (e.g., fastq.gz
, .fq.gz
and so on).
The Verkko assembler can optionally be configured for using three different phasing signals;
add the column target
to the sample sheet plus the following fields:
- trio-based: set value
trio
in columntarget
and add columnshap1
andhap2
pointing to meryl k-mer databases of the sample parents (conventionally,hap1
should be the father andhap2
the mother) - Hi-C: set value
hic
in columntarget
and add fieldshic1
andhic2
for the Hi-C reads of mate 1 and 2, respectively - Strand-seq: set value
sseq
in columntarget
and add fieldphasing_paths
pointing to a.gaf
format file produced by the Grapahasing pipeline
Since Verkko itself is implemented as a Snakemake workflow, you can execute a dry run to check if all
input requirements are met by setting the option verkko_dry_run
to true
, see this example configuration:
All standard workflows of the CUBI implement the same user interface (or at least aim for a highly similar interface). Hence, before executing the workflow, we strongly recommend reading through the documentation that explains how we help you to keep track of your analysis results; we refer to this concept as "file accounting". This feature of standard CUBI workflows enables the pipeline to auto- matically create a so-called "manifest" file for your analysis run.
In case of questions, please open a GitHub issue in the repository of the workflow you are trying to execute.
Besides reading the user documentation, CUBI developers find more information regarding standadized workflow development in the developer notes. Please keep in mind to always cross-link that information with the guidelines published in the CUBI knowledge base.
Please raise any issues with these guidelines "close to the code", i.e., either open an issue in the knowledge base repo or in the affected repo for more specific cases.