This GitHub repository describes the workflow used for benchmarking shallow metagenomic sequencing of Mock communities (DNA mixtures), as described in Treichel et al. (bioRxiv).
With this study we aimed to systematically assess the threshold of sequencing depth necessary for the read-outs of taxonomic analysis, functional genes and pathways, and MAG construction. We used two complex mixtures of DNA from cultured gut bacteria. An evenly distributed Mock community containing DNA of 70 strains and one with staggered distribution containing DNA of 24 strains. Analysis was done at nine sequencing depths (0.1, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0, 5.0, and 10.0 Gb). Additionally, library preparation was performed in two facilities and the effect of background DNA was tested.
Pre-processing
- Sub-sampling of shotgun metagenomic data to exact number of reads (seqtk)
- Quality filtering and phiX removal (trimmomatic, bbmap, bbduk)
- Assembly into Contigs (MegaHit)
Taxonomic Analysis
- Coverage of reads to reference genomes (coverM)
- Read count per reference genome / Relative abundance (coverM)
Functional Analysis
- Protein coding gene prediction (prodigal)
- Alignment to predicted protein sequences of reference genomes (Diamond)
- Completeness of functional pathways (kofamscan, KEGGdecoder)
Construction of metagenome-assembled genomes (MAGs)
- Removal of contigs < 1000 bp
- MAG construction (bowtie2, metabat2)
- Evaluation of completeness and contamination (checkM)
- Taxonomic assignment (GTDB-tk)
- MAG composition with respect to reference genomes (blastn)
For installation of the required tools please visite their original websites linked above.
Metagenomic data has be deposited at the European Nucleotide Archive/NCBI and is accessible under Project no. PRJEB83573.
Treichel et al. bioRxiv