LTR_Stream

LTR_Stream is designed to achieve sub-lineage level LTR-RT clustering in closely related species, discovering valuable genetic markers for genome comparison and LTR-RT modular evolution in host genome. It takes nucleotide sequences of intact LTR-RTs belonging to the same LTR-lineage as input. A mix of LTR-RTs from different LTR-lineages is theoretically acceptable but not recommended. LTR_Stream gives each LTR-RT a cluster label and automatically evaluates reliability of each cluster.

Graphical Abstract

Sub-lineage Clustering of Retand LTR-RTs from Three Papaver Species

Installation

Requirements

1.Conda

Conda should be installed with version >=23.1.0.
Mamba is recommended for speeding up conda.

2.Git

Please install git with version >=2.34.1.
Please configure the ssh key of git and make sure git clone could work.

Install LTR_Stream

1.Clone LTR_Stream from github.

ltrStreamInstallPath=path_you_want_to_install_LTR_Stream
cd ${ltrStreamInstallPath} && git clone git@github.com:xjtu-omics/LTR_Stream.git

2.Run install script.

If mamba is not available, please run:

cd ${ltrStreamInstallPath}/LTR_Stream && bash Init_LTR_Stream_Env.sh

For a speeding up installation with mamba, please run:

cd ${ltrStreamInstallPath}/LTR_Stream && bash Init_LTR_Stream_Env.sh mamba

Quick start

Sub-lineage level LTR-RT clustering

conda activate ltrStream
cd ${ltrStreamInstallPath}/LTR_Stream/src
snakemake -s LTR_Stream.smk -f stream --config ltrParaFile=path_of_ltrPara.tsv -j {threadsNumber}

Config file `ltrPara.tsv`

LTR_Stream will automatically run according to parameters set in this TSV (Tab-Separated Values) file, so please make sure all the parameters were set here before you start LTR_Stream.smk. (You can modify the file name and path according to your preferences. In this documentation, we refer to this configuration parameters file as ltrPara.tsv.) The following is an example of the file. Lines beginning with a # represent comments. Values of optional parameter in this example represent their default values in LTR_Stream. To facilitate parameter debugging, the parameters that significantly impact the clustering results will be introduced first. A standard example of this file is under examples/.

# An example for ltrPara.tsv
# All tab seperated.

# Mandatory parameters
# workDir: A blank directory for running LTR_Stream
# The outputs of LTR_Stream are in workDir/figure
workDir /xx/xx/xx

# ltrFasta: The nucleotide sequences of the LTR-RT set you want to 
# analyze. Please ensure it is in standard FASTA format.
ltrFasta    /xx/xx/xx.fa


# Optional parameters

# Important parameters
# minOverLapForNovelModule: Control the number and dispersion of module sequences in the 3-D space.
# It is used in disjoint-set data structure to judge if there should be an edge between two alignment
# regions. It could be set at the range from 0 to 1. Greater minOverLapForNovelModule leads to more
# module sequences and more dispersed result. Default is 0.8.
minOverLapForNovelModule 0.8


# topModNum: Control the number and dispersion of module sequences with minOverLapForNovelModule.
# Greater topModNum leads to more module sequences and more dispersed result. LTR_Stream will output 
# a module number versus covered LTR-RTs (named coverLine.pdf under workDir/figure). The topModNum 
# needs to be set large enough to ensure that about 80% of LTR-RTs have 2-3 modules. It is estimated 
# topModNum should be at range 200-800. Larger minOverLapForNovelModule usually corresponds to larger 
# topModNum. You can adjust the two parameters in coordination. Default is 250.
topModNum   250


# tsneEarlyExaggeration: A crucial parameter in t-SNE dimensionality reduction, directly affects the
# results. An excessively large tsneEarlyExaggeration will result in a linear shape in the 
# three-dimensional space, while an excessively small tsneEarlyExaggeration will lead to a dispersed 
# distribution, hindering sub-lineage identification. It is estimated that tsneEarlyExaggeration 
# should be at range 6-9. Default is 6.
tsneEarlyExaggeration   6


# tsnePerplexity: Larger tsnePerplexity will provide more robust results, while a smaller 
# tsnePerplexity will yield more detailed clustering results. Depending on the size of the dataset, 
# it is not recommended to set tsnePerplexity to less than 3% of the module sequence count for larger 
# datasets, or less than 15 for smaller datasets. Default is 100.
tsnePerplexity  100


# cluCentCut: A parameter used to assess the degree of intra-class distribution aggregation in 3D
# space. A larger cluCentCut will result in coarser clustering. If LTR_Stream indicates clustering 
# failure, please increase this parameter within the range of 0-1. Default is 0.1.
cluCentCut  0.1


# maxZoomInLevel: LTR_Stream achieves fine clustering of LTR-RT in complex scenarios through 
# iterative expansion. This parameter controls the maximum depth of iterative expansion. If you find
# that the number of clusters is too large or some categories within subviews are verified as 
# unreliable, you can set a maximum limit. The default value is -1, which means no limit is set. 
maxZoomInLevel  -1

# minClusterSize: Clusters containing fewer Module sequences will be considered noise and filtered 
# out. If the number of clustered LTR-RTs is low, it is recommended to reduce the threshold accordingly. 
# Default is 50.
minClusterSize  50

# Other parameters

# tsneLearningRate: For t-SNE dimensionality reduction, LTR_Stream requires a very small learning rate, 
# with a default value of 6. It is not recommended to set this value higher than 8.
tsneLearningRate    6



# blastEvalue: Used for homology searching in BLASTn. Default is 1e-10. If the LTR-RT sequence set to 
# be analyzed has particularly high similarity, you can reduce this parameter accordingly.
blastEvalue 1e-10


# Parameters used in ElPiGraph
epgLambda   0.01
epgMu   0.01
epgAlpha 0.05

Parameter adjustment

Due to differences in dataset size and the degree of internal sequence consistency, the parameters of LTR_Stream need to be adjusted according to each dataset. The two most critical parameters are minOverlap and tsnePerplexity. LTR_Stream provides intermediate visualizations to assist with parameter tuning. Before tuning the parameters, please run LTR_Stream.smk with the default settings. If no clustering results are produced or the results are unsatisfactory, proceed with parameter adjustment. Specifically, please begin by determining an appropriate value for minOverlap.

minOverlap adjustment

Based on our testing, the value of minOverlap typically falls within the range of 0.75 to 0.99. After modifying this parameter, please re-run the following command — this will generate the file figure/coverLine.pdf. Use this file to adjust minOverlap accordingly. A specific example and guidance is shown in the figure below.

snakemake -s LTR_Stream.smk -f staNovelSelectedNumVsCovered -R selectNovelUnits --config ltrParaFile=/path/to/ltrPara.tab -j {threads}

After setting an appropriate minOverlap value, please proceed to adjust tsnePerplexity.

tsnePerplexity adjustment

The parameter tsnePerplexity is primarily related to the dataset size and does not have a fixed optimal range. For datasets with around 10,000 sequences, a value between 100 and 200 may be appropriate. For smaller datasets with only a few hundred sequences, values between 10 and 100 are typically suitable. After each adjustment, please run the following command — LTR_Stream will update the corresponding file figure/tsneDistance.3d.gif. Please adjust this parameter based on the example provided in the figure below.

snakemake -s LTR_Stream.smk -f tsnePlot -R mergeModules --config ltrParaFile=/path/to/ltrPara.tab -j {threads}

After completing these two parameter tuning steps, please run the following command to perform clustering.

snakemake -s LTR_Stream.smk -f stream -R stream --config ltrParaFile=/path/to/ltrPara.tab -j {threads}

Outputs

All outputs will be saved in workDir/figure

workDir/figure/**.gif

GIF files showing clustering results in each 3D-subview.

workDir/figure/clusterRel.tsv

TSV file recording final cluster results.

workDir/figure/classInfo.tsv

TSV file recording details of clustering including coordinate information in each subview.

workDir/figure/clusterNuclVali.tsv

TSV file recording foldchange of inter- and intra-distance and corresponding significance for each cluster. Foldchange that signifcantly greater than one means reliable cluster.

workDir/figure/coverLine.pdf

Line plot showing module number and corresponding covered LTR-RT percentage. Used for guiding parameter ajustment.

Citation

Xu, Tun, et al. "Deciphering complex interactions between LTR retrotransposons and three Papaver species using LTR_Stream." Genomics, Proteomics & Bioinformatics (2025): qzaf061.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.readMe_images		.readMe_images
config		config
examples		examples
src		src
Init_LTR_Stream_Env.sh		Init_LTR_Stream_Env.sh
LICENSE		LICENSE
readMe.md		readMe.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LTR_Stream

Graphical Abstract

Sub-lineage Clustering of Retand LTR-RTs from Three Papaver Species

Installation

Requirements

1.Conda

2.Git

Install LTR_Stream

1.Clone LTR_Stream from github.

2.Run install script.

Quick start

Sub-lineage level LTR-RT clustering

Config file `ltrPara.tsv`

Parameter adjustment

minOverlap adjustment

tsnePerplexity adjustment

Outputs

workDir/figure/**.gif

workDir/figure/clusterRel.tsv

workDir/figure/classInfo.tsv

workDir/figure/clusterNuclVali.tsv

workDir/figure/coverLine.pdf

Citation

About

Uh oh!

Releases 2

Packages

Languages

License

xjtu-omics/LTR_Stream

Folders and files

Latest commit

History

Repository files navigation

LTR_Stream

Graphical Abstract

Sub-lineage Clustering of Retand LTR-RTs from Three Papaver Species

Installation

Requirements

1.Conda

2.Git

Install LTR_Stream

1.Clone LTR_Stream from github.

2.Run install script.

Quick start

Sub-lineage level LTR-RT clustering

Config file ltrPara.tsv

Parameter adjustment

minOverlap adjustment

tsnePerplexity adjustment

Outputs

workDir/figure/**.gif

workDir/figure/clusterRel.tsv

workDir/figure/classInfo.tsv

workDir/figure/clusterNuclVali.tsv

workDir/figure/coverLine.pdf

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Config file `ltrPara.tsv`

Packages