8
8
9
9
# TIRmite
10
10
11
- Build and map profile Hidden Markov Models for Terminal Inverted Repeat
12
- families (TIR-pHMMs) to genomic sequences for annotation of MITES and complete
11
+ Build and map profile Hidden Markov Models for Terminal Inverted Repeat
12
+ families (TIR-pHMMs) to genomic sequences for annotation of MITES and complete
13
13
DNA-Transposons with variable internal sequence composition.
14
14
15
-
16
- TIRmite is packaged with * tSplit* a tool for extraction of terminal repeats
17
- from complete transposons.
15
+ If you have a draft TE model (i.e. from RepeatModeler or EDTA) and want to identify the TIR's to use with TIRmite - we recommend using [ * tSplit* ] ( https://github.com/Adamtaranto/TE-splitter/ ) a tool for extraction of terminal repeats from complete transposons.
18
16
19
17
# Table of contents
20
18
21
19
* [ About TIRmite] ( #about-tirmite )
22
20
* [ Algorithm overview] ( #algorithm-overview )
23
21
* [ Options and usage] ( #options-and-usage )
24
- * [ Installing TIRmite] ( #installing-tirmite )
25
- * [ Example usage] ( #example-usage )
26
- * [ Standard options] ( #standard-options )
27
- * [ Custom DNA matrices] ( #custom-dna-matrices )
28
- * [ Additional tools] ( additional-tools )
29
- * [ tSplit] ( tsplit )
30
- * [ tSplit algorithm overview] ( tsplit-algorithm-overview )
31
- * [ tSplit options and usage] ( tsplit-options-and-usage )
22
+ * [ Installing TIRmite] ( #installing-tirmite )
23
+ * [ Example usage] ( #example-usage )
24
+ * [ Standard options] ( #standard-options )
25
+ * [ Custom DNA matrices] ( #custom-dna-matrices )
32
26
* [ Issues] ( #issues )
33
27
* [ License] ( #license )
34
- * [ Logo] ( #logo )
35
-
36
28
37
29
## About TIRmite
38
30
@@ -41,9 +33,10 @@ genome-wide annotation of TIR families. These can be provided by the user or
41
33
built from aligned TIRs oriented as 5' outer edge --> 3' inner edge.
42
34
43
35
Three classes of output are produced:
36
+
44
37
1 . All significant TIR hit sequences written to fasta (per query HMM).
45
38
2 . Candidate elements comprised of paired TIRs are written to fasta (per query HMM).
46
- 3 . Genomic annotations of candidate elements and, optionally, TIR hits
39
+ 3 . Genomic annotations of candidate elements and, optionally, TIR hits
47
40
(paired and unpaired) are written as a single GFF3 file.
48
41
49
42
## Algorithm overview
@@ -67,13 +60,14 @@ Three classes of output are produced:
67
60
68
61
TIRmite requires Python >= v3.8
69
62
70
- Dependencies:
71
- - TIR-pHMM build and search
72
- * [ HMMER3] ( http://hmmer.org )
73
- - Extract terminal repeats from predicted TEs
74
- * [ pymummer] ( https://github.com/sanger-pathogens/pymummer ) version >= 0.10.3 with wrapper for nucmer option * --diagfactor* .
75
- * [ MUMmer] ( https://github.com/mummer4/mummer )
76
- * [ BLAST+] ( ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ ) (Optional)
63
+ Dependencies:
64
+
65
+ * TIR-pHMM build and search
66
+ * [ HMMER3] ( http://hmmer.org )
67
+ * Extract terminal repeats from predicted TEs
68
+ * [ pymummer] ( https://github.com/sanger-pathogens/pymummer ) version >= 0.10.3 with wrapper for nucmer option * --diagfactor* .
69
+ * [ MUMmer] ( https://github.com/mummer4/mummer )
70
+ * [ BLAST+] ( ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ ) (Optional)
77
71
78
72
You can create a Conda environment with these dependencies using the YAML files in this repo.
79
73
@@ -106,6 +100,7 @@ Install latest release from PyPi.
106
100
```
107
101
108
102
Install from Bioconda.
103
+
109
104
``` bash
110
105
% conda install -c bioconda tirmite
111
106
```
@@ -123,16 +118,15 @@ Test installation.
123
118
``` bash
124
119
# Print version number and exit.
125
120
% tirmite --version
126
- tirmite 1.1.6
121
+ tirmite 1.2.0
127
122
128
123
# Get usage information
129
124
% tirmite --help
130
125
```
131
126
132
127
### Example usage
133
128
134
- Report all hits and valid pairings of TIR_A in target.fasta (interval <= 10000, hits cover > 40% len of hmm model),
135
- and write GFF3 annotation file.
129
+ Report all hits and valid pairings of TIR_A in target.fasta (interval <= 10000, hits cover > 40% len of hmm model), and write GFF3 annotation file.
136
130
137
131
``` bash
138
132
% tirmite --genome target.fasta --hmmFile TIR_A.hmm --gffOut TIR_elements_in_Target.gff3 --maxdist 10000 --mincov 0.4
@@ -148,11 +142,12 @@ In this example the two TIRs should be oriented to begin with "GA".
148
142
149
143
5\` ** GA\>\>\>\>\>\>\> ** ATGC <<<<<<<TC 3\`
150
144
3\` CT>>>>>>>> TACG <<<<<<<AG 5\`
145
+
151
146
### Standard options
152
147
153
148
Run ` tirmite --help ` to view the program's most commonly used options:
154
149
155
- ```
150
+ ``` code
156
151
tirmite [-h] [--version] --genome GENOME [--hmmDir HMMDIR]
157
152
[--hmmFile HMMFILE] [--alnDir ALNDIR] [--alnFile ALNFILE]
158
153
[--alnFormat {clustal,fasta,nexus,phylip,stockholm}]
@@ -230,107 +225,13 @@ Non-standard HMMER paths:
230
225
231
226
### Custom DNA Matrices
232
227
233
- nhmmer can be supplied with custom DNA score matrices for assessing hmm match scores.
234
- Standard NCBI-BLAST matrices such as NUC.4.4 are compatible. (See: ftp://ftp.ncbi.nlm.nih.gov/blast/matrices/NUC.4.4)
235
-
236
- ## Additional tools
237
-
238
- ### tSplit
239
-
240
- Extract Terminal Inverted Repeats (TIRs) DNA transposons.
241
-
242
- ### tSplit algorithm overview
243
-
244
- tSplit attempts to identify terminal repeats in transposable elements by
245
- first aligning each element to itself using nucmer, and then applying a set of
246
- tuneable heuristics to select an alignment pair most likely to represent a TIR.
247
-
248
- 1 . Exclude all diagonal/self-matches
249
- 2 . If tsplit-TIR: Retain only alignment pairs on opposite strands (inverse repeats)
250
- 3 . Retain pairs for which the 5' match begins within x bases of element start
251
- and whose 3' match ends within x bases of element end
252
- 4 . Exclude alignment pairs which overlap (potential SSRs)
253
- 5 . If multiple candidates remain select alignment pair with largest internal segment
254
- (i.e. closest to element ends)
255
-
256
- ### tSplit options and usage
257
-
258
- ### tSplit example usage
259
-
260
- For each element in * dna-transposons.fasta* split into internal and external (TIR) segments.
261
- Split segments will be written to * TIR_split_TE-splitter_output.fasta* with suffix "_ I" for
262
- internal or "_ TIR" for external segments. TIRs must be at least 10bp in length and share 80%
263
- identity and occur within 10bp of each end of the input element. Additionally, synthetic
264
- MITEs will be constructed by concatenation of left and right TIRs, with internal segments
265
- excised.
266
-
267
-
268
- ``` bash
269
- % tsplit-TIR -i dna-transposons.fasta -p TIR_split
270
- ```
271
-
272
- ### tSplit options
273
-
274
- Run ` tsplit-TIR --help ` to view the programs' most commonly used
275
- options:
276
-
277
- ```
278
- Usage: tsplit-TIR [-h] -i INFILE [-p PREFIX] [-d OUTDIR]
279
- [--splitmode {all,split,internal,external,None}]
280
- [--makemites] [--keeptemp] [-v] [-m MAXDIST]
281
- [--minid MINID] [--minterm MINTERM] [--minseed MINSEED]
282
- [--diagfactor DIAGFACTOR] [--method {blastn,nucmer}]
283
-
284
- Help:
285
- -h, --help Show this help message and exit.
286
-
287
- Input:
288
- -i, --infile Multifasta containing complete elements.
289
- (Required)
290
-
291
- Output:
292
- -p, --prefix All output files begin with this string. (Default:[infile basename])
293
- -d, --outdir Write output files to this directory. (Default: cwd)
294
- --keeptemp If set do not remove temp directory on completion.
295
- -v, --verbose If set, report progress.
296
-
297
- Report settings:
298
- --splitmode Options: {all,split,internal,external,None}
299
- all = Report input sequence as well as internal and external segments.
300
- split = Report internal and external segments after splitting.
301
- internal = Report only internal segments.
302
- external = Report only terminal repeat segments.
303
- None = Only report synthetic MITES (when --makemites is also set).
304
- (Default: split)
305
- --makemites Experimental function: Attempt to construct synthetic MITE sequences from TIRs by concatenating
306
- 5' and 3' TIRs. Available only in 'tsplit-TIR' mode
307
-
308
- Alignment settings:
309
- --method Select alignment tool. Note: blastn may perform better on very short high-identity TRs,
310
- while nucmer is more robust to small indels.
311
- Options: {blastn,nucmer}
312
- (Default: nucmer)
313
- --minid Minimum identity between terminal repeat pairs. As float.
314
- (Default: 80.0)
315
- --minterm Minimum length for a terminal repeat to be considered.
316
- Equivalent to nucmer "--mincluster"
317
- (Default: 10)
318
- -m, --maxdist Terminal repeat candidates must be no more than this many bases from ends of an input element.
319
- Note: Increase this value if you suspect that your element is nested within some flanking sequence.
320
- (Default: 10)
321
- --minseed Minimum length of a maximal exact match to be included in final match cluster.
322
- Equivalent to nucmer "--minmatch".
323
- (Default: 5)
324
- --diagfactor Maximum diagonal difference factor for clustering of matches within nucmer,
325
- i.e. diagonal difference / match separation
326
- (default 0.20)
327
- Note: Increase value for greater tolerance of indels between terminal repeats.
328
- ```
228
+ nhmmer can be supplied with custom DNA score matrices for assessing hmm match scores.
229
+ Standard NCBI-BLAST matrices such as NUC.4.4 are compatible. (See: ftp://ftp.ncbi.nlm.nih.gov/blast/matrices/NUC.4.4)
329
230
330
231
## Issues
331
232
332
233
Submit feedback to the [ Issue Tracker] ( https://github.com/Adamtaranto/TIRmite/issues )
333
234
334
235
## License
335
236
336
- Software provided under MIT license.
237
+ Software provided under MIT license.
0 commit comments