LongTrack is a novel framework that uses long-read metagenomic assemblies and reliable informatics tailored for FMT strain tracking.
The core idea of LongTrack is based on (1) long-read metagenomic sequencing data generated for the donors’ and recipients’ samples before FMT to construct de novo metagenome-assembled genomes (long-read MAGs), (2) selecting strain-specific unique k-mers from long-read MAGs, and (3) the use of unique k-mers and short-read metagenomic data for precision strain tracking.
conda create -y -n longtrack -c bioconda longtrack
conda activate longtrack
Download the release archive
To install manually, ensure you have Python 2 (version 2.7), Bowtie2, and the following Python libraries: numpy >=1.7.1, HTSeq >=0.5.3p9, matplotlib-base >= 1.0.0, seaborn >= 0.5.0, pandas >= 0.7.3
wget https://github.com/fanglab/LongTrack/releases/download/v1.0.0/LongTrack.v1.0.0.tar.gz
Extract and install
tar zxvf LongTrack.v1.0.0.tar.gz
cd LongTrack.v1.0.0
chmod +x LongTrack
To showcase the toolbox applications, we provide the following demonstration (which takes ~5 minutes in total) that integrates two major steps together: 1) an illustrative run that performs strain tracking for 5 long-read MAGs across 3 post-FMT samples; 2) summarizing strain tracking
LongTrack --test >longtrack.log
LongTrack \
--Strain,-s [MAG_dir] \
--kmer,-k [unique_kmer_dir] \
--Metagenome,-m [metagenome_dir] \
--Conflict_table,-c [conflict_table] \
--Output,-o [output_dir] \
[--threads,-t N] \
[--test]
For example
LongTrack -s Data/MAG/ \
-k Data/unique_kmer/ \
-m Data/metagenome/ -c Data/conflict_table \
-o Tracking_results -t 5 >longtrack.log
Basic options
--output, -o
: Output folder, will be created automatically.
--threads, -t
: Number of CPU threads to use (default: 1).
--test
: Runs LongTrack on the test dataset.
Input options
--Strain, -s
: This folder includes long-read MAGs that de novo assembled from the donors. And the k-mer (k=31) database for each MAG (*_kmcdb) generated by KMC v3.1.0
Akkermansia_muciniphila_D1.fna
Akkermansia_muciniphila_D1_kmcdb_dump
Akkermansia_muciniphila_D1_kmcdb.kmc_pre
Akkermansia_muciniphila_D1_kmcdb.kmc_suf
…
--Metagenome, -m
: This folder includes the short-read metagenomic data of post-FMT recipients across 3 time points and unrelated samples as the negative control (NC1 and NC2). (Paired-end data: *_sample_PE1.fasta *_sample_PE2.fasta)
NC1_sample_PE1.fasta
NC1_sample_PE2.fasta
postFMT1W4_sample_PE1.fasta
postFMT1W4_sample_PE2.fasta
…
--kmer, -k
: This folder includes the unique k-mers from each long-read MAG
Akkermansia_muciniphila_D1_kmcdb_dump_withpos
…
--Conflict_table, -c
: This file lists, for each sample, its conflicts (no-relationship samples). For example, negative controls are in conflict with every sample, which would be used as no-relationship samples to calculate confidence scores
postFMT1W4 NC1,NC2
postFMT1W8 NC1,NC2
postFMT1Y5 NC1,NC2
NC1 NC2,postFMT1W4,postFMT1W8,postFMT1Y5
NC2 NC1,postFMT1W4,postFMT1W8,postFMT1Y5
LongTrack output files
Once the above scripts complete, the following files and figures will be generated in the folders described below.
Strain tracking table: Tracking_results/results_readdistribution_actualreads_confidencescores
, Presence (1) or absence (0) of each long-read MAG across different post-FMT samples collected at time points and negative controls.
strain NC1 NC2 postFMT1W4 postFMT1W8 postFMT1Y5
Akkermansia_muciniphila_D1 0 0 1 1 1
Alistipes_onderdonkii_D1 0 0 1 1 1
Bifidobacterium_longum.D1.str1 0 0 1 1 1
Bifidobacterium_longum.D1.str2 0 0 1 1 1
Gemmiger_formicilis_D1 0 0 1 1 1
Strain tracking is summarized in a heatmap: Tracking_results/Strain_tracking_results.png
. Presence (green) or absence (gray) of strains in post-FMT recipients determined by strain-specific unique k-mers from long-read MAGs.
Yu Fan, Mi Ni, Varun Aggarwala, et al. LongTrack: long read metagenomics-based precise tracking of bacterial strains and their genomic changes after fecal microbiota transplantation. bioRxiv 2024.09.30.615906; doi: https://doi.org/10.1101/2024.09.30.615906