Skip to content

Commit b67ae8b

Browse files
committed
update to version 2.0
1 parent f737509 commit b67ae8b

File tree

2 files changed

+106
-36
lines changed

2 files changed

+106
-36
lines changed

README.md

Lines changed: 103 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,24 @@
11

2-
FunOrder
3-
=========
2+
FunOrder 2
3+
==========
44

5-
The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution – searches for co-evolutionary linked genes in a set of inputted genes. The functionality and applicability was tested with biosynthetic gene clusters (BGCs). The resulting information can be used to choose which genes of a gene cluster are most likely the core genes necessary for the biosynthesis of a secondary metabolite. The flexibility and adaptability of the core program allows the integration of any protein database and can thus be adapted for different phyla and research objectives. FunOrder might be used for the analysis of co-evolution on a whole proteome, enabling the genome wide detection of evolutionary linked genes, in the future.
5+
The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution – searches for co-evolutionary linked genes in a set of inputted genes. The functionality and applicability was tested with biosynthetic gene clusters (BGCs). The resulting information can be used to choose which genes of a gene cluster are most likely the core genes necessary for the biosynthesis of a secondary metabolite. The flexibility and adaptability of the core program allows the integration of any protein database and can thus be adapted for different phyla and research objectives. FunOrder might be used for the analysis of co-evolution on a whole proteome, enabling the genome wide detection of evolutionary linked genes.
66

77
The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution. FunOrder is copyright 2020 Gabriel A. Vignolle, Denise Schaffer, Robert L. Mach, Astrid R. Mach-Aigner and Christian Derntl, and is released under the MIT License. If you find FunOrder useful to your work, please cite:
88

9-
https://zenodo.org/record/5118984 and DOI: 10.5281/zenodo.5118984 for the code and
9+
**FunOrder 2.0 – a fully automated method for the identification of co-evolved genes**
10+
11+
https://zenodo.org/record/5118984 and DOI: 10.5281/zenodo.5118984 for the code and
1012

1113
Vignolle GA, Schaffer D, Zehetner L, Mach RL, Mach-Aigner AR, Derntl C (2021) **FunOrder: A robust and semi-automated method for the identification of essential biosynthetic genes through computational molecular co-evolution.** PLoS Comput Biol 17(9): e1009372. doi: https://doi.org/10.1371/journal.pcbi.1009372
1214

1315
**The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution** Gabriel A Vignolle, Denise Schaffer, Robert L Mach, Astrid R Mach-Aigner, Christian Derntl. **bioRxiv** 2021.01.29.428829; doi: https://doi.org/10.1101/2021.01.29.428829
1416

1517
The software input files are biosynthetic gene clusters (BGC) with gene translations in genbank file format or fasta format, that contain the amino acid sequences of all the genes found in the BGC of interest.
1618

17-
FunOrder performs a sequence similarity search using blastp on our manually curated database, multiple sequence alignment using the ClustalW algorithm, calculates the best scoring ML tree with RAxML (Randomized Axelerated Maximum Likelihood) for each gene and uses the TreeKO algorithm to calculate the pairwise distances between these trees. All pairwise **strict** and **evolutionary** distances are saved as matrices respectively. The matrices are used as input for an R script for visualization and further analysis of the distances. The strict and evolutionary distances are summed up to a third **combined** distance measure. For further detail and an exemplary analysis of the FunOrder output, see our publication.
19+
FunOrder performs a sequence similarity search using blastp on our manually curated database, multiple sequence alignment using the ClustalW algorithm, calculates the best scoring ML tree with RAxML (Randomized Axelerated Maximum Likelihood) for each gene and uses the TreeKO algorithm to calculate the pairwise distances between these trees. Based on these distances **FunOrder 2** automatically determines the optimal number of clusters in the output, and a subsequent k-means clustering based on the first three principal components of the PCAs clusters the genes/proteins into co-evolutionary linked protein families. See our newest publications for further details.
1820

19-
The three distance matrices are first visualized as heatmaps with a dendrogram computed with the complete linkage method, that finds similar clusters. Then the Euclidean distance within the matrices is computed and clustered using Ward’s minimum variance method aiming at finding compact spherical clusters, with the implemented squaring of the dissimilarities before cluster updating, for each of the three distance matrices separately, with scaled and unscaled input data. Lastly a principle component analysis (PCA) is performed on each distance matrix and the score plot of the first two principle components visualized, respectively. FunOrder includes scripts adapted to the use on servers and for the integration in various pipelines.
21+
FunOrder 2 is provided with a database of ascomycete proteomes and can therefore be used for the detection of co-evolution of proteins in this fungal division. If other divisions, classes, or even kingdoms shall be analyzed, a suitable new proteome database must be compiled and tested, see our Wiki for further details.
2022

2123

2224
Dependencies
@@ -45,6 +47,11 @@ R packages:
4547
* gplots
4648
* car
4749
* mdatools
50+
* xlsx
51+
* cluster
52+
* NbClust
53+
* randtests
54+
4855

4956
Installation
5057
------------
@@ -67,23 +74,27 @@ install.packages('stats') # at the R prompt
6774
install.packages('gplots') # at the R prompt
6875
install.packages('car') # at the R prompt
6976
install.packages('mdatools') # at the R prompt
77+
install.packages('xlsx') # at the R prompt
78+
install.packages('cluster') # at the R prompt
79+
install.packages('NbClust') # at the R prompt
80+
install.packages('randtests') # at the R prompt
7081
```
7182

72-
Now download FunOrder **funorder_v1.tar.xz** and unpack the archive.
83+
Now download FunOrder **funorder_XX.tar.xz** and unpack the archive.
7384

7485
```
75-
tar -xf funorder_v1.tar.xz
86+
tar -xf funorder_XX.tar.xz
7687
```
7788

7889
open the scripts funorder.sh ; funorder_fasta_only.sh ; funorder_server.sh ; funorder_server_fasta_only.sh
7990
change 'SOURCEDIR' value in line 43 in funorder.sh ; funorder_fasta_only.sh and line 45 in funorder_server.sh ; funorder_server_fasta_only.sh:
8091

8192
```
82-
SOURCEDIR=~/funorder_proj/funorder_v1/
93+
SOURCEDIR=~/funorder_proj/funorder_XX/
8394
```
84-
to (path to the funorder_v1 directory: e.g. ~/path/to/your/directory/)
95+
to (path to the funorder_XX directory: e.g. ~/path/to/your/directory/)
8596
```
86-
SOURCEDIR=~/path/to/your/directory/funorder_v1/
97+
SOURCEDIR=~/path/to/your/directory/funorder_XX/
8798
```
8899

89100
You can now add the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -99,7 +110,7 @@ Run FunOrder from the folder containing the gbk file you want to analyze.
99110
(cd ~/path/to/your/gbk_files)
100111

101112
```
102-
sh ~/path/to/directory/funorder_v1/funorder.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
113+
sh ~/path/to/directory/funorder_XX/funorder.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
103114
```
104115

105116
or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -119,12 +130,25 @@ The output of FunOrder is saved in /file.gbk.analysis/alignment
119130

120131
#### Output files produced by funorder.sh
121132

122-
File | Description
123-
-----------------------------|------------
124-
Rplot.pdf | PDF file with the Analyze.R output as described in our publication
125-
strict_distance.matrix | matrix of the strict distance
126-
evol_distance.matrix | matrix of the evolutionary [speciation] distance
133+
File | Description
134+
------------------------------------|------------
135+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
136+
FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
137+
cluster_definition_pred.xlsx | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
138+
strict_distance.matrix | matrix of the strict distance
139+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
140+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
127141

142+
if the automatic clustering failed then the outputfiles are
143+
144+
File | Description
145+
---------------------------------------|------------
146+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
147+
FunOrder_clustering_Rplots_defined.pdf | PDF file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
148+
cluster_definition_3.xlsx | XLSX file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
149+
strict_distance.matrix | matrix of the strict distance
150+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
151+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
128152

129153

130154

@@ -135,7 +159,7 @@ Run FunOrder from the folder containing the fasta file you want to analyze.
135159
(cd ~/path/to/your/fasta_files)
136160

137161
```
138-
sh ~/path/to/directory/funorder_v1/funorder_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
162+
sh ~/path/to/directory/funorder_XX/funorder_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
139163
```
140164

141165
or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -155,12 +179,25 @@ The output of FunOrder is saved in /file.fasta.analysis/alignment
155179

156180
#### Output files produced by funorder_fasta_only.sh
157181

158-
File | Description
159-
-----------------------------|------------
160-
Rplot.pdf | PDF file with the Analyze.R output as described in our publication
161-
strict_distance.matrix | matrix of the strict distance
162-
evol_distance.matrix | matrix of the evolutionary [speciation] distance
182+
File | Description
183+
------------------------------------|------------
184+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
185+
FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
186+
cluster_definition_pred.xlsx | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
187+
strict_distance.matrix | matrix of the strict distance
188+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
189+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
190+
191+
if the automatic clustering failed then the outputfiles are
163192

193+
File | Description
194+
---------------------------------------|------------
195+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
196+
FunOrder_clustering_Rplots_defined.pdf | PDF file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
197+
cluster_definition_3.xlsx | XLSX file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
198+
strict_distance.matrix | matrix of the strict distance
199+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
200+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
164201

165202

166203

@@ -171,7 +208,7 @@ Run FunOrder from the folder containing the gbk file you want to analyze.
171208
(cd ~/path/to/your/gbk_files)
172209

173210
```
174-
sh ~/path/to/directory/funorder_v1/funorder_server.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
211+
sh ~/path/to/directory/funorder_XX/funorder_server.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
175212
```
176213

177214
or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -190,11 +227,26 @@ The output of FunOrder is saved in /file.gbk.analysis/alignment
190227

191228
#### Output files produced by funorder.sh
192229

193-
File | Description
194-
-----------------------------|------------
195-
Rplot.pdf | PDF file with the Analyze.R output as described in our publication
196-
strict_distance.matrix | matrix of the strict distance
197-
evol_distance.matrix | matrix of the evolutionary [speciation] distance
230+
File | Description
231+
------------------------------------|------------
232+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
233+
FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
234+
cluster_definition_pred.xlsx | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
235+
strict_distance.matrix | matrix of the strict distance
236+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
237+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
238+
239+
if the automatic clustering failed then the outputfiles are
240+
241+
File | Description
242+
---------------------------------------|------------
243+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
244+
FunOrder_clustering_Rplots_defined.pdf | PDF file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
245+
cluster_definition_3.xlsx | XLSX file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
246+
strict_distance.matrix | matrix of the strict distance
247+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
248+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
249+
198250

199251

200252
#### Example usage for generic antiSMASH output:
@@ -208,7 +260,7 @@ mkdir funorder_output
208260
then from within the antiSMASH output-folder run following command:
209261

210262
```
211-
for file in *cluster*.gbk; do echo $file; sh ~/path/to/directory/funorder_v1/funorder_server.sh [Thread number] $file [absolute path to "funorder_output" directory] [database] ; done
263+
for file in *cluster*.gbk; do echo $file; sh ~/path/to/directory/funorder_XX/funorder_server.sh [Thread number] $file [absolute path to "funorder_output" directory] [database] ; done
212264
```
213265

214266
This will perform a FunOrder analysis for each cluster predicted by antiSMASH.
@@ -220,7 +272,7 @@ Run FunOrder from the folder containing the fasta file you want to analyze.
220272
(cd ~/path/to/your/fasta_files)
221273

222274
```
223-
sh ~/path/to/directory/funorder_v1/funorder_server_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
275+
sh ~/path/to/directory/funorder_XX/funorder_server_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
224276
```
225277

226278
or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -239,9 +291,24 @@ The output of FunOrder is saved in /file.fasta.analysis/alignment
239291

240292
#### Output files produced by funorder_fasta_only.sh
241293

242-
File | Description
243-
-----------------------------|------------
244-
Rplot.pdf | PDF file with the Analyze.R output as described in our publication
245-
strict_distance.matrix | matrix of the strict distance
246-
evol_distance.matrix | matrix of the evolutionary [speciation] distance
294+
File | Description
295+
------------------------------------|------------
296+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
297+
FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
298+
cluster_definition_pred.xlsx | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
299+
strict_distance.matrix | matrix of the strict distance
300+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
301+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
302+
303+
if the automatic clustering failed then the outputfiles are
304+
305+
File | Description
306+
---------------------------------------|------------
307+
FunOrder_Supplementary_Rplots.pdf | PDF file with the Analyze.R output as described in our publication FunOrder 2
308+
FunOrder_clustering_Rplots_defined.pdf | PDF file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
309+
cluster_definition_3.xlsx | XLSX file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
310+
strict_distance.matrix | matrix of the strict distance
311+
evol_distance.matrix | matrix of the evolutionary [speciation] distance
312+
Internal_coevolution_quotient.txt | text file containing the ICQ analysis
313+
247314

funorder_2.0.tar.xz

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:c17dde0c5b2d10e999a36cd27be5db12819b8c05074ea360575c04fb8f4aee94
3+
size 1014203552

0 commit comments

Comments
 (0)