You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution – searches for co-evolutionary linked genes in a set of inputted genes. The functionality and applicability was tested with biosynthetic gene clusters (BGCs). The resulting information can be used to choose which genes of a gene cluster are most likely the core genes necessary for the biosynthesis of a secondary metabolite. The flexibility and adaptability of the core program allows the integration of any protein database and can thus be adapted for different phyla and research objectives. FunOrder might be used for the analysis of co-evolution on a whole proteome, enabling the genome wide detection of evolutionary linked genes, in the future.
5
+
The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution – searches for co-evolutionary linked genes in a set of inputted genes. The functionality and applicability was tested with biosynthetic gene clusters (BGCs). The resulting information can be used to choose which genes of a gene cluster are most likely the core genes necessary for the biosynthesis of a secondary metabolite. The flexibility and adaptability of the core program allows the integration of any protein database and can thus be adapted for different phyla and research objectives. FunOrder might be used for the analysis of co-evolution on a whole proteome, enabling the genome wide detection of evolutionary linked genes.
6
6
7
7
The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution. FunOrder is copyright 2020 Gabriel A. Vignolle, Denise Schaffer, Robert L. Mach, Astrid R. Mach-Aigner and Christian Derntl, and is released under the MIT License. If you find FunOrder useful to your work, please cite:
8
8
9
-
https://zenodo.org/record/5118984 and DOI: 10.5281/zenodo.5118984 for the code and
9
+
**FunOrder 2.0 – a fully automated method for the identification of co-evolved genes**
10
+
11
+
https://zenodo.org/record/5118984 and DOI: 10.5281/zenodo.5118984 for the code and
10
12
11
13
Vignolle GA, Schaffer D, Zehetner L, Mach RL, Mach-Aigner AR, Derntl C (2021) **FunOrder: A robust and semi-automated method for the identification of essential biosynthetic genes through computational molecular co-evolution.** PLoS Comput Biol 17(9): e1009372. doi: https://doi.org/10.1371/journal.pcbi.1009372
12
14
13
15
**The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution** Gabriel A Vignolle, Denise Schaffer, Robert L Mach, Astrid R Mach-Aigner, Christian Derntl. **bioRxiv** 2021.01.29.428829; doi: https://doi.org/10.1101/2021.01.29.428829
14
16
15
17
The software input files are biosynthetic gene clusters (BGC) with gene translations in genbank file format or fasta format, that contain the amino acid sequences of all the genes found in the BGC of interest.
16
18
17
-
FunOrder performs a sequence similarity search using blastp on our manually curated database, multiple sequence alignment using the ClustalW algorithm, calculates the best scoring ML tree with RAxML (Randomized Axelerated Maximum Likelihood) for each gene and uses the TreeKO algorithm to calculate the pairwise distances between these trees. All pairwise **strict** and**evolutionary**distances are saved as matrices respectively. The matrices are used as input for an R script for visualization and further analysis of the distances. The strict and evolutionary distances are summed up to a third **combined** distance measure. For further detail and an exemplary analysis of the FunOrder output, see our publication.
19
+
FunOrder performs a sequence similarity search using blastp on our manually curated database, multiple sequence alignment using the ClustalW algorithm, calculates the best scoring ML tree with RAxML (Randomized Axelerated Maximum Likelihood) for each gene and uses the TreeKO algorithm to calculate the pairwise distances between these trees. Based on these distances**FunOrder 2**automatically determines the optimal number of clusters in the output, and a subsequent k-means clustering based on the first three principal components of the PCAs clusters the genes/proteins into co-evolutionary linked protein families. See our newest publications for further details.
18
20
19
-
The three distance matrices are first visualized as heatmaps with a dendrogram computed with the complete linkage method, that finds similar clusters. Then the Euclidean distance within the matrices is computed and clustered using Ward’s minimum variance method aiming at finding compact spherical clusters, with the implemented squaring of the dissimilarities before cluster updating, for each of the three distance matrices separately, with scaled and unscaled input data. Lastly a principle component analysis (PCA) is performed on each distance matrix and the score plot of the first two principle components visualized, respectively. FunOrder includes scripts adapted to the use on servers and for the integration in various pipelines.
21
+
FunOrder 2 is provided with a database of ascomycete proteomes and can therefore be used for the detection of co-evolution of proteins in this fungal division. If other divisions, classes, or even kingdoms shall be analyzed, a suitable new proteome database must be compiled and tested, see our Wiki for further details.
20
22
21
23
22
24
Dependencies
@@ -45,6 +47,11 @@ R packages:
45
47
* gplots
46
48
* car
47
49
* mdatools
50
+
* xlsx
51
+
* cluster
52
+
* NbClust
53
+
* randtests
54
+
48
55
49
56
Installation
50
57
------------
@@ -67,23 +74,27 @@ install.packages('stats') # at the R prompt
67
74
install.packages('gplots') # at the R prompt
68
75
install.packages('car') # at the R prompt
69
76
install.packages('mdatools') # at the R prompt
77
+
install.packages('xlsx') # at the R prompt
78
+
install.packages('cluster') # at the R prompt
79
+
install.packages('NbClust') # at the R prompt
80
+
install.packages('randtests') # at the R prompt
70
81
```
71
82
72
-
Now download FunOrder **funorder_v1.tar.xz** and unpack the archive.
83
+
Now download FunOrder **funorder_XX.tar.xz** and unpack the archive.
73
84
74
85
```
75
-
tar -xf funorder_v1.tar.xz
86
+
tar -xf funorder_XX.tar.xz
76
87
```
77
88
78
89
open the scripts funorder.sh ; funorder_fasta_only.sh ; funorder_server.sh ; funorder_server_fasta_only.sh
79
90
change 'SOURCEDIR' value in line 43 in funorder.sh ; funorder_fasta_only.sh and line 45 in funorder_server.sh ; funorder_server_fasta_only.sh:
80
91
81
92
```
82
-
SOURCEDIR=~/funorder_proj/funorder_v1/
93
+
SOURCEDIR=~/funorder_proj/funorder_XX/
83
94
```
84
-
to (path to the funorder_v1 directory: e.g. ~/path/to/your/directory/)
95
+
to (path to the funorder_XX directory: e.g. ~/path/to/your/directory/)
85
96
```
86
-
SOURCEDIR=~/path/to/your/directory/funorder_v1/
97
+
SOURCEDIR=~/path/to/your/directory/funorder_XX/
87
98
```
88
99
89
100
You can now add the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -99,7 +110,7 @@ Run FunOrder from the folder containing the gbk file you want to analyze.
99
110
(cd ~/path/to/your/gbk_files)
100
111
101
112
```
102
-
sh ~/path/to/directory/funorder_v1/funorder.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
113
+
sh ~/path/to/directory/funorder_XX/funorder.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
103
114
```
104
115
105
116
or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -119,12 +130,25 @@ The output of FunOrder is saved in /file.gbk.analysis/alignment
119
130
120
131
#### Output files produced by funorder.sh
121
132
122
-
File | Description
123
-
-----------------------------|------------
124
-
Rplot.pdf | PDF file with the Analyze.R output as described in our publication
125
-
strict_distance.matrix | matrix of the strict distance
126
-
evol_distance.matrix | matrix of the evolutionary [speciation] distance
133
+
File | Description
134
+
------------------------------------|------------
135
+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
136
+
FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
137
+
cluster_definition_pred.xlsx | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
138
+
strict_distance.matrix | matrix of the strict distance
139
+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
140
+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
127
141
142
+
if the automatic clustering failed then the outputfiles are
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
244
+
FunOrder_clustering_Rplots_defined.pdf | PDF file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
245
+
cluster_definition_3.xlsx | XLSX file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
246
+
strict_distance.matrix | matrix of the strict distance
247
+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
248
+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
249
+
198
250
199
251
200
252
#### Example usage for generic antiSMASH output:
@@ -208,7 +260,7 @@ mkdir funorder_output
208
260
then from within the antiSMASH output-folder run following command:
209
261
210
262
```
211
-
for file in *cluster*.gbk; do echo $file; sh ~/path/to/directory/funorder_v1/funorder_server.sh [Thread number] $file [absolute path to "funorder_output" directory] [database] ; done
263
+
for file in *cluster*.gbk; do echo $file; sh ~/path/to/directory/funorder_XX/funorder_server.sh [Thread number] $file [absolute path to "funorder_output" directory] [database] ; done
212
264
```
213
265
214
266
This will perform a FunOrder analysis for each cluster predicted by antiSMASH.
@@ -220,7 +272,7 @@ Run FunOrder from the folder containing the fasta file you want to analyze.
220
272
(cd ~/path/to/your/fasta_files)
221
273
222
274
```
223
-
sh ~/path/to/directory/funorder_v1/funorder_server_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
275
+
sh ~/path/to/directory/funorder_XX/funorder_server_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
224
276
```
225
277
226
278
or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -239,9 +291,24 @@ The output of FunOrder is saved in /file.fasta.analysis/alignment
239
291
240
292
#### Output files produced by funorder_fasta_only.sh
241
293
242
-
File | Description
243
-
-----------------------------|------------
244
-
Rplot.pdf | PDF file with the Analyze.R output as described in our publication
245
-
strict_distance.matrix | matrix of the strict distance
246
-
evol_distance.matrix | matrix of the evolutionary [speciation] distance
294
+
File | Description
295
+
------------------------------------|------------
296
+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
297
+
FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
298
+
cluster_definition_pred.xlsx | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
299
+
strict_distance.matrix | matrix of the strict distance
300
+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
301
+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
302
+
303
+
if the automatic clustering failed then the outputfiles are
0 commit comments