Skip to content

LongTrack is a novel framework that uses long-read metagenomic assemblies and reliable informatics tailored for FMT strain tracking.

License

Notifications You must be signed in to change notification settings

fanglab/LongTrack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LongTrack

Description

LongTrack is a novel framework that uses long-read metagenomic assemblies and reliable informatics tailored for FMT strain tracking.

The core idea of LongTrack is based on (1) long-read metagenomic sequencing data generated for the donors’ and recipients’ samples before FMT to construct de novo metagenome-assembled genomes (long-read MAGs), (2) selecting strain-specific unique k-mers from long-read MAGs, and (3) the use of unique k-mers and short-read metagenomic data for precision strain tracking.

Installation

1. Bioconda (Recommended)

conda create -y -n longtrack -c bioconda longtrack
conda activate longtrack

2. Manual Installation

Download the release archive

To install manually, ensure you have Python 2 (version 2.7), Bowtie2, and the following Python libraries: numpy >=1.7.1, HTSeq >=0.5.3p9, matplotlib-base >= 1.0.0, seaborn >= 0.5.0, pandas >= 0.7.3

wget https://github.com/fanglab/LongTrack/releases/download/v1.0.0/LongTrack.v1.0.0.tar.gz

Extract and install

tar zxvf LongTrack.v1.0.0.tar.gz
cd LongTrack.v1.0.0
chmod +x LongTrack

Quick start

Run the test data:

To showcase the toolbox applications, we provide the following demonstration (which takes ~5 minutes in total) that integrates two major steps together: 1) an illustrative run that performs strain tracking for 5 long-read MAGs across 3 post-FMT samples; 2) summarizing strain tracking

LongTrack --test  >longtrack.log

Usage:

  LongTrack \
  --Strain,-s [MAG_dir] \
  --kmer,-k [unique_kmer_dir] \
  --Metagenome,-m [metagenome_dir] \
  --Conflict_table,-c [conflict_table] \
  --Output,-o [output_dir] \
  [--threads,-t N] \
  [--test]

For example

  LongTrack -s Data/MAG/ \
  -k Data/unique_kmer/ \
  -m Data/metagenome/ -c Data/conflict_table \
  -o Tracking_results -t 5 >longtrack.log

Basic options

--output, -o : Output folder, will be created automatically.

--threads, -t : Number of CPU threads to use (default: 1).

--test : Runs LongTrack on the test dataset.

Input options

--Strain, -s : This folder includes long-read MAGs that de novo assembled from the donors. And the k-mer (k=31) database for each MAG (*_kmcdb) generated by KMC v3.1.0

Akkermansia_muciniphila_D1.fna
Akkermansia_muciniphila_D1_kmcdb_dump
Akkermansia_muciniphila_D1_kmcdb.kmc_pre
Akkermansia_muciniphila_D1_kmcdb.kmc_suf
…

--Metagenome, -m : This folder includes the short-read metagenomic data of post-FMT recipients across 3 time points and unrelated samples as the negative control (NC1 and NC2). (Paired-end data: *_sample_PE1.fasta *_sample_PE2.fasta)

NC1_sample_PE1.fasta
NC1_sample_PE2.fasta
postFMT1W4_sample_PE1.fasta
postFMT1W4_sample_PE2.fasta
…

--kmer, -k : This folder includes the unique k-mers from each long-read MAG

Akkermansia_muciniphila_D1_kmcdb_dump_withpos
…

--Conflict_table, -c : This file lists, for each sample, its conflicts (no-relationship samples). For example, negative controls are in conflict with every sample, which would be used as no-relationship samples to calculate confidence scores

postFMT1W4  	NC1,NC2
postFMT1W8  	NC1,NC2
postFMT1Y5  	NC1,NC2
NC1 	NC2,postFMT1W4,postFMT1W8,postFMT1Y5
NC2 	NC1,postFMT1W4,postFMT1W8,postFMT1Y5

LongTrack output files

Once the above scripts complete, the following files and figures will be generated in the folders described below.

Strain tracking table: Tracking_results/results_readdistribution_actualreads_confidencescores, Presence (1) or absence (0) of each long-read MAG across different post-FMT samples collected at time points and negative controls.

strain	NC1	NC2	postFMT1W4	postFMT1W8	postFMT1Y5
Akkermansia_muciniphila_D1  	0	0	1	1	1
Alistipes_onderdonkii_D1    	0	0	1	1	1
Bifidobacterium_longum.D1.str1    	0	0	1	1	1
Bifidobacterium_longum.D1.str2   	0	0	1	1	1
Gemmiger_formicilis_D1  	0	0	1	1	1

Strain tracking is summarized in a heatmap: Tracking_results/Strain_tracking_results.png. Presence (green) or absence (gray) of strains in post-FMT recipients determined by strain-specific unique k-mers from long-read MAGs.

Citations

Yu Fan, Mi Ni, Varun Aggarwala, et al. LongTrack: long read metagenomics-based precise tracking of bacterial strains and their genomic changes after fecal microbiota transplantation. bioRxiv 2024.09.30.615906; doi: https://doi.org/10.1101/2024.09.30.615906

About

LongTrack is a novel framework that uses long-read metagenomic assemblies and reliable informatics tailored for FMT strain tracking.

Resources

License

Stars

Watchers

Forks

Packages

No packages published