Skip to content

Commit 6a557b7

Browse files
authored
Spin out tsplit (#17)
* use external tsplit module * ignore dev dir * bump version * mv dev deps up
1 parent 4f6e489 commit 6a557b7

File tree

5 files changed

+32
-496
lines changed

5 files changed

+32
-496
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ temp/
44
test/
55
pkg-mgt/
66
devnotes/
7+
.ideas/
78

89
# Mac stuff
910
.DS_Store

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
cff-version: 1.2.0
22
message: "If you use this software, please cite it as below."
33
title: "TIRmite: Annotation of cryptic DNA-transposon variants with Hidden Markov Models."
4-
version: 1.1.7
4+
version: 1.2.0
55
date-released: 2025-03-22
66
authors:
77
- family-names: Taranto

README.md

Lines changed: 25 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -8,31 +8,23 @@
88

99
# TIRmite
1010

11-
Build and map profile Hidden Markov Models for Terminal Inverted Repeat
12-
families (TIR-pHMMs) to genomic sequences for annotation of MITES and complete
11+
Build and map profile Hidden Markov Models for Terminal Inverted Repeat
12+
families (TIR-pHMMs) to genomic sequences for annotation of MITES and complete
1313
DNA-Transposons with variable internal sequence composition.
1414

15-
16-
TIRmite is packaged with *tSplit* a tool for extraction of terminal repeats
17-
from complete transposons.
15+
If you have a draft TE model (i.e. from RepeatModeler or EDTA) and want to identify the TIR's to use with TIRmite - we recommend using [*tSplit*](https://github.com/Adamtaranto/TE-splitter/) a tool for extraction of terminal repeats from complete transposons.
1816

1917
# Table of contents
2018

2119
* [About TIRmite](#about-tirmite)
2220
* [Algorithm overview](#algorithm-overview)
2321
* [Options and usage](#options-and-usage)
24-
* [Installing TIRmite](#installing-tirmite)
25-
* [Example usage](#example-usage)
26-
* [Standard options](#standard-options)
27-
* [Custom DNA matrices](#custom-dna-matrices)
28-
* [Additional tools](additional-tools)
29-
* [tSplit](tsplit)
30-
* [tSplit algorithm overview](tsplit-algorithm-overview)
31-
* [tSplit options and usage](tsplit-options-and-usage)
22+
* [Installing TIRmite](#installing-tirmite)
23+
* [Example usage](#example-usage)
24+
* [Standard options](#standard-options)
25+
* [Custom DNA matrices](#custom-dna-matrices)
3226
* [Issues](#issues)
3327
* [License](#license)
34-
* [Logo](#logo)
35-
3628

3729
## About TIRmite
3830

@@ -41,9 +33,10 @@ genome-wide annotation of TIR families. These can be provided by the user or
4133
built from aligned TIRs oriented as 5' outer edge --> 3' inner edge.
4234

4335
Three classes of output are produced:
36+
4437
1. All significant TIR hit sequences written to fasta (per query HMM).
4538
2. Candidate elements comprised of paired TIRs are written to fasta (per query HMM).
46-
3. Genomic annotations of candidate elements and, optionally, TIR hits
39+
3. Genomic annotations of candidate elements and, optionally, TIR hits
4740
(paired and unpaired) are written as a single GFF3 file.
4841

4942
## Algorithm overview
@@ -67,13 +60,14 @@ Three classes of output are produced:
6760

6861
TIRmite requires Python >= v3.8
6962

70-
Dependencies:
71-
- TIR-pHMM build and search
72-
* [HMMER3](http://hmmer.org)
73-
- Extract terminal repeats from predicted TEs
74-
* [pymummer](https://github.com/sanger-pathogens/pymummer) version >= 0.10.3 with wrapper for nucmer option *--diagfactor*.
75-
* [MUMmer](https://github.com/mummer4/mummer)
76-
* [BLAST+](ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) (Optional)
63+
Dependencies:
64+
65+
* TIR-pHMM build and search
66+
* [HMMER3](http://hmmer.org)
67+
* Extract terminal repeats from predicted TEs
68+
* [pymummer](https://github.com/sanger-pathogens/pymummer) version >= 0.10.3 with wrapper for nucmer option *--diagfactor*.
69+
* [MUMmer](https://github.com/mummer4/mummer)
70+
* [BLAST+](ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) (Optional)
7771

7872
You can create a Conda environment with these dependencies using the YAML files in this repo.
7973

@@ -106,6 +100,7 @@ Install latest release from PyPi.
106100
```
107101

108102
Install from Bioconda.
103+
109104
```bash
110105
% conda install -c bioconda tirmite
111106
```
@@ -123,16 +118,15 @@ Test installation.
123118
```bash
124119
# Print version number and exit.
125120
% tirmite --version
126-
tirmite 1.1.6
121+
tirmite 1.2.0
127122

128123
# Get usage information
129124
% tirmite --help
130125
```
131126

132127
### Example usage
133128

134-
Report all hits and valid pairings of TIR_A in target.fasta (interval <= 10000, hits cover > 40% len of hmm model),
135-
and write GFF3 annotation file.
129+
Report all hits and valid pairings of TIR_A in target.fasta (interval <= 10000, hits cover > 40% len of hmm model), and write GFF3 annotation file.
136130

137131
```bash
138132
% tirmite --genome target.fasta --hmmFile TIR_A.hmm --gffOut TIR_elements_in_Target.gff3 --maxdist 10000 --mincov 0.4
@@ -148,11 +142,12 @@ In this example the two TIRs should be oriented to begin with "GA".
148142

149143
5\` **GA\>\>\>\>\>\>\>** ATGC <<<<<<<TC 3\`
150144
3\` CT>>>>>>>> TACG <<<<<<<AG 5\`
145+
151146
### Standard options
152147

153148
Run `tirmite --help` to view the program's most commonly used options:
154149

155-
```
150+
```code
156151
tirmite [-h] [--version] --genome GENOME [--hmmDir HMMDIR]
157152
[--hmmFile HMMFILE] [--alnDir ALNDIR] [--alnFile ALNFILE]
158153
[--alnFormat {clustal,fasta,nexus,phylip,stockholm}]
@@ -230,107 +225,13 @@ Non-standard HMMER paths:
230225

231226
### Custom DNA Matrices
232227

233-
nhmmer can be supplied with custom DNA score matrices for assessing hmm match scores.
234-
Standard NCBI-BLAST matrices such as NUC.4.4 are compatible. (See: ftp://ftp.ncbi.nlm.nih.gov/blast/matrices/NUC.4.4)
235-
236-
## Additional tools
237-
238-
### tSplit
239-
240-
Extract Terminal Inverted Repeats (TIRs) DNA transposons.
241-
242-
### tSplit algorithm overview
243-
244-
tSplit attempts to identify terminal repeats in transposable elements by
245-
first aligning each element to itself using nucmer, and then applying a set of
246-
tuneable heuristics to select an alignment pair most likely to represent a TIR.
247-
248-
1. Exclude all diagonal/self-matches
249-
2. If tsplit-TIR: Retain only alignment pairs on opposite strands (inverse repeats)
250-
3. Retain pairs for which the 5' match begins within x bases of element start
251-
and whose 3' match ends within x bases of element end
252-
4. Exclude alignment pairs which overlap (potential SSRs)
253-
5. If multiple candidates remain select alignment pair with largest internal segment
254-
(i.e. closest to element ends)
255-
256-
### tSplit options and usage
257-
258-
### tSplit example usage
259-
260-
For each element in *dna-transposons.fasta* split into internal and external (TIR) segments.
261-
Split segments will be written to *TIR_split_TE-splitter_output.fasta* with suffix "_I" for
262-
internal or "_TIR" for external segments. TIRs must be at least 10bp in length and share 80%
263-
identity and occur within 10bp of each end of the input element. Additionally, synthetic
264-
MITEs will be constructed by concatenation of left and right TIRs, with internal segments
265-
excised.
266-
267-
268-
```bash
269-
% tsplit-TIR -i dna-transposons.fasta -p TIR_split
270-
```
271-
272-
### tSplit options
273-
274-
Run `tsplit-TIR --help` to view the programs' most commonly used
275-
options:
276-
277-
```
278-
Usage: tsplit-TIR [-h] -i INFILE [-p PREFIX] [-d OUTDIR]
279-
[--splitmode {all,split,internal,external,None}]
280-
[--makemites] [--keeptemp] [-v] [-m MAXDIST]
281-
[--minid MINID] [--minterm MINTERM] [--minseed MINSEED]
282-
[--diagfactor DIAGFACTOR] [--method {blastn,nucmer}]
283-
284-
Help:
285-
-h, --help Show this help message and exit.
286-
287-
Input:
288-
-i, --infile Multifasta containing complete elements.
289-
(Required)
290-
291-
Output:
292-
-p, --prefix All output files begin with this string. (Default:[infile basename])
293-
-d, --outdir Write output files to this directory. (Default: cwd)
294-
--keeptemp If set do not remove temp directory on completion.
295-
-v, --verbose If set, report progress.
296-
297-
Report settings:
298-
--splitmode Options: {all,split,internal,external,None}
299-
all = Report input sequence as well as internal and external segments.
300-
split = Report internal and external segments after splitting.
301-
internal = Report only internal segments.
302-
external = Report only terminal repeat segments.
303-
None = Only report synthetic MITES (when --makemites is also set).
304-
(Default: split)
305-
--makemites Experimental function: Attempt to construct synthetic MITE sequences from TIRs by concatenating
306-
5' and 3' TIRs. Available only in 'tsplit-TIR' mode
307-
308-
Alignment settings:
309-
--method Select alignment tool. Note: blastn may perform better on very short high-identity TRs,
310-
while nucmer is more robust to small indels.
311-
Options: {blastn,nucmer}
312-
(Default: nucmer)
313-
--minid Minimum identity between terminal repeat pairs. As float.
314-
(Default: 80.0)
315-
--minterm Minimum length for a terminal repeat to be considered.
316-
Equivalent to nucmer "--mincluster"
317-
(Default: 10)
318-
-m, --maxdist Terminal repeat candidates must be no more than this many bases from ends of an input element.
319-
Note: Increase this value if you suspect that your element is nested within some flanking sequence.
320-
(Default: 10)
321-
--minseed Minimum length of a maximal exact match to be included in final match cluster.
322-
Equivalent to nucmer "--minmatch".
323-
(Default: 5)
324-
--diagfactor Maximum diagonal difference factor for clustering of matches within nucmer,
325-
i.e. diagonal difference / match separation
326-
(default 0.20)
327-
Note: Increase value for greater tolerance of indels between terminal repeats.
328-
```
228+
nhmmer can be supplied with custom DNA score matrices for assessing hmm match scores.
229+
Standard NCBI-BLAST matrices such as NUC.4.4 are compatible. (See: ftp://ftp.ncbi.nlm.nih.gov/blast/matrices/NUC.4.4)
329230

330231
## Issues
331232

332233
Submit feedback to the [Issue Tracker](https://github.com/Adamtaranto/TIRmite/issues)
333234

334235
## License
335236

336-
Software provided under MIT license.
237+
Software provided under MIT license.

pyproject.toml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ classifiers = [
1919
"License :: OSI Approved :: MIT License",
2020
]
2121

22-
dependencies = ["pandas>=0.23.4", 'biopython>=1.70', "pymummer>=0.10.3",]
22+
dependencies = ["pandas>=0.23.4", 'biopython>=1.70', "pymummer>=0.10.3", "tsplit"]
2323

2424
dynamic = ["version"]
2525

@@ -30,7 +30,10 @@ repository = "https://github.com/adamtaranto/TIRmite"
3030

3131
[project.scripts]
3232
tirmite="tirmite.cmd_tirmite:main"
33-
tsplit-TIR="tirmite.cmd_TIR:main"
33+
34+
# Optional dependencies for testing
35+
[project.optional-dependencies]
36+
dev = ["hatch", "isort", "ipykernel", "numpydoc-validation", "pre-commit", "pytest", "pytest-cov", "ruff"]
3437

3538
[tool.hatch.build]
3639
source = "src"
@@ -51,10 +54,6 @@ fallback-version = "0.0.0"
5154
[tool.hatch.build.hooks.vcs]
5255
version-file = "src/tirmite/_version.py"
5356

54-
# Optional dependencies for testing
55-
[project.optional-dependencies]
56-
dev = ["hatch", "isort", "ipykernel", "numpydoc-validation", "pre-commit", "pytest", "pytest-cov", "ruff"]
57-
5857
[tool.pytest.ini_options]
5958
addopts = "-v --cov --cov-branch --cov-report=xml --cov-report=term"
6059
testpaths = ["tests"]

0 commit comments

Comments
 (0)