update to version 2.0

gvignolle · gvignolle · commit b67ae8b06c16 · 2022-01-07T13:13:02.000+01:00
diff --git a/README.md b/README.md
@@ -1,22 +1,24 @@
 
-FunOrder
-=========
+FunOrder 2
+==========
 
-The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution – searches for co-evolutionary linked genes in a set of inputted genes. The functionality and applicability was tested with biosynthetic gene clusters (BGCs). The resulting information can be used to choose which genes of a gene cluster are most likely the core genes necessary for the biosynthesis of a secondary metabolite. The flexibility and adaptability of the core program allows the integration of any protein database and can thus be adapted for different phyla and research objectives. FunOrder might be used for the analysis of co-evolution on a whole proteome, enabling the genome wide detection of evolutionary linked genes, in the future. 
+The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution – searches for co-evolutionary linked genes in a set of inputted genes. The functionality and applicability was tested with biosynthetic gene clusters (BGCs). The resulting information can be used to choose which genes of a gene cluster are most likely the core genes necessary for the biosynthesis of a secondary metabolite. The flexibility and adaptability of the core program allows the integration of any protein database and can thus be adapted for different phyla and research objectives. FunOrder might be used for the analysis of co-evolution on a whole proteome, enabling the genome wide detection of evolutionary linked genes.
 
 The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution. FunOrder is copyright 2020 Gabriel A. Vignolle, Denise Schaffer, Robert L. Mach, Astrid R. Mach-Aigner and Christian Derntl, and is released under the MIT License. If you find FunOrder useful to your work, please cite:
 
-https://zenodo.org/record/5118984 and  DOI: 10.5281/zenodo.5118984 for the code and
+**FunOrder 2.0 – a fully automated method for the identification of co-evolved genes**
+
+https://zenodo.org/record/5118984 and DOI: 10.5281/zenodo.5118984 for the code and
 
 Vignolle GA, Schaffer D, Zehetner L, Mach RL, Mach-Aigner AR, Derntl C (2021) **FunOrder: A robust and semi-automated method for the identification of essential biosynthetic genes through computational molecular co-evolution.** PLoS Comput Biol 17(9): e1009372. doi: https://doi.org/10.1371/journal.pcbi.1009372
 
 **The Functional Order (FunOrder) tool - Identification of essential biosynthetic genes through computational molecular co-evolution** Gabriel A Vignolle, Denise Schaffer, Robert L Mach, Astrid R Mach-Aigner, Christian Derntl. **bioRxiv** 2021.01.29.428829; doi: https://doi.org/10.1101/2021.01.29.428829
 
 The software input files are biosynthetic gene clusters (BGC) with gene translations in genbank file format or fasta format, that contain the amino acid sequences of all the genes found in the BGC of interest. 
 
-FunOrder performs a sequence similarity search using blastp on our manually curated database, multiple sequence alignment using the ClustalW algorithm, calculates the best scoring ML tree with RAxML (Randomized Axelerated Maximum Likelihood) for each gene and uses the TreeKO algorithm to calculate the pairwise distances between these trees. All pairwise **strict** and **evolutionary** distances are saved as matrices respectively. The matrices are used as input for an R script for visualization and further analysis of the distances. The strict and evolutionary distances are summed up to a third **combined** distance measure. For further detail and an exemplary analysis of the FunOrder output, see our publication.
+FunOrder performs a sequence similarity search using blastp on our manually curated database, multiple sequence alignment using the ClustalW algorithm, calculates the best scoring ML tree with RAxML (Randomized Axelerated Maximum Likelihood) for each gene and uses the TreeKO algorithm to calculate the pairwise distances between these trees. Based on these distances **FunOrder 2** automatically determines the optimal number of clusters in the output, and a subsequent k-means clustering based on the first three principal components of the PCAs clusters the genes/proteins into co-evolutionary linked protein families. See our newest publications for further details.
 
-The three distance matrices are first visualized as heatmaps with a dendrogram computed with the complete linkage method, that finds similar clusters. Then the Euclidean distance within the matrices is computed and clustered using Ward’s minimum variance method aiming at finding compact spherical clusters, with the implemented squaring of the dissimilarities before cluster updating, for each of the three distance matrices separately, with scaled and unscaled input data. Lastly a principle component analysis (PCA) is performed on each distance matrix and the score plot of the first two principle components visualized, respectively. FunOrder includes scripts adapted to the use on servers and for the integration in various pipelines.
+FunOrder 2 is provided with a database of ascomycete proteomes and can therefore be used for the detection of co-evolution of proteins in this fungal division. If other divisions, classes, or even kingdoms shall be analyzed, a suitable new proteome database must be compiled and tested, see our Wiki for further details.
 
 
 Dependencies
@@ -45,6 +47,11 @@ R packages:
 * gplots
 * car
 * mdatools
+* xlsx
+* cluster
+* NbClust
+* randtests
+
 
 Installation
 ------------
@@ -67,23 +74,27 @@ install.packages('stats') # at the R prompt
 install.packages('gplots') # at the R prompt
 install.packages('car') # at the R prompt
 install.packages('mdatools') # at the R prompt
+install.packages('xlsx') # at the R prompt
+install.packages('cluster') # at the R prompt
+install.packages('NbClust') # at the R prompt
+install.packages('randtests') # at the R prompt
 ```
 
-Now download FunOrder **funorder_v1.tar.xz** and unpack the archive.
+Now download FunOrder **funorder_XX.tar.xz** and unpack the archive.
 
 ```
-tar -xf funorder_v1.tar.xz
+tar -xf funorder_XX.tar.xz
 ```
 
 open the scripts funorder.sh ; funorder_fasta_only.sh ; funorder_server.sh ; funorder_server_fasta_only.sh
 change 'SOURCEDIR' value in line 43 in funorder.sh ; funorder_fasta_only.sh and line 45 in funorder_server.sh ; funorder_server_fasta_only.sh:
 
 ```
-SOURCEDIR=~/funorder_proj/funorder_v1/ 
+SOURCEDIR=~/funorder_proj/funorder_XX/ 
 ```
-to (path to the funorder_v1 directory: e.g. ~/path/to/your/directory/)
+to (path to the funorder_XX directory: e.g. ~/path/to/your/directory/)
 ```
-SOURCEDIR=~/path/to/your/directory/funorder_v1/
+SOURCEDIR=~/path/to/your/directory/funorder_XX/
 ```
 
 You can now add the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -99,7 +110,7 @@ Run FunOrder from the folder containing the gbk file you want to analyze.
 (cd ~/path/to/your/gbk_files)
 
 ```
-sh ~/path/to/directory/funorder_v1/funorder.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
+sh ~/path/to/directory/funorder_XX/funorder.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
 ```
 
 or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -119,12 +130,25 @@ The output of FunOrder is saved in /file.gbk.analysis/alignment
 
 #### Output files produced by funorder.sh
 
-File                         | Description
------------------------------|------------
-Rplot.pdf                    | PDF file with the Analyze.R output as described in our publication
-strict_distance.matrix       | matrix of the strict distance
-evol_distance.matrix         | matrix of the evolutionary [speciation] distance
+File                                | Description
+------------------------------------|------------
+FunOrder_Supplementary_Rplots.pdf   | PDF file with the Analyze.R output as described in our publication FunOrder 2
+FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
+cluster_definition_pred.xlsx        | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
+strict_distance.matrix              | matrix of the strict distance
+evol_distance.matrix                | matrix of the evolutionary [speciation] distance
+Internal_coevolution_quotient.txt   | text file containing the ICQ analysis
 
+if the automatic clustering failed then the outputfiles are
+
+File                                   | Description
+---------------------------------------|------------
+FunOrder_Supplementary_Rplots.pdf      | PDF file with the Analyze.R output as described in our publication FunOrder 2
+FunOrder_clustering_Rplots_defined.pdf | PDF file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
+cluster_definition_3.xlsx              | XLSX file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
+strict_distance.matrix                 | matrix of the strict distance
+evol_distance.matrix                   | matrix of the evolutionary [speciation] distance
+Internal_coevolution_quotient.txt      | text file containing the ICQ analysis
 
 
 
@@ -135,7 +159,7 @@ Run FunOrder from the folder containing the fasta file you want to analyze.
 (cd ~/path/to/your/fasta_files)
 
 ```
-sh ~/path/to/directory/funorder_v1/funorder_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
+sh ~/path/to/directory/funorder_XX/funorder_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
 ```
 
 or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -155,12 +179,25 @@ The output of FunOrder is saved in /file.fasta.analysis/alignment
 
 #### Output files produced by funorder_fasta_only.sh
 
-File                         | Description
------------------------------|------------
-Rplot.pdf                    | PDF file with the Analyze.R output as described in our publication
-strict_distance.matrix       | matrix of the strict distance
-evol_distance.matrix         | matrix of the evolutionary [speciation] distance
+File                                | Description
+------------------------------------|------------
+FunOrder_Supplementary_Rplots.pdf   | PDF file with the Analyze.R output as described in our publication FunOrder 2
+FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
+cluster_definition_pred.xlsx        | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
+strict_distance.matrix              | matrix of the strict distance
+evol_distance.matrix                | matrix of the evolutionary [speciation] distance
+Internal_coevolution_quotient.txt   | text file containing the ICQ analysis
+
+if the automatic clustering failed then the outputfiles are
 
+File                                   | Description
+---------------------------------------|------------
+FunOrder_Supplementary_Rplots.pdf      | PDF file with the Analyze.R output as described in our publication FunOrder 2
+FunOrder_clustering_Rplots_defined.pdf | PDF file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
+cluster_definition_3.xlsx              | XLSX file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
+strict_distance.matrix                 | matrix of the strict distance
+evol_distance.matrix                   | matrix of the evolutionary [speciation] distance
+Internal_coevolution_quotient.txt      | text file containing the ICQ analysis
 
 
 
@@ -171,7 +208,7 @@ Run FunOrder from the folder containing the gbk file you want to analyze.
 (cd ~/path/to/your/gbk_files)
 
 ```
-sh ~/path/to/directory/funorder_v1/funorder_server.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
+sh ~/path/to/directory/funorder_XX/funorder_server.sh [Thread number] [gbk file] [absolute path to outputdirectory] [database]
 ```
 
 or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -190,11 +227,26 @@ The output of FunOrder is saved in /file.gbk.analysis/alignment
 
 #### Output files produced by funorder.sh
 
-File                         | Description
------------------------------|------------
-Rplot.pdf                    | PDF file with the Analyze.R output as described in our publication
-strict_distance.matrix       | matrix of the strict distance
-evol_distance.matrix         | matrix of the evolutionary [speciation] distance
+File                                | Description
+------------------------------------|------------
+FunOrder_Supplementary_Rplots.pdf   | PDF file with the Analyze.R output as described in our publication FunOrder 2
+FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
+cluster_definition_pred.xlsx        | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
+strict_distance.matrix              | matrix of the strict distance
+evol_distance.matrix                | matrix of the evolutionary [speciation] distance
+Internal_coevolution_quotient.txt   | text file containing the ICQ analysis
+
+if the automatic clustering failed then the outputfiles are
+
+File                                   | Description
+---------------------------------------|------------
+FunOrder_Supplementary_Rplots.pdf      | PDF file with the Analyze.R output as described in our publication FunOrder 2
+FunOrder_clustering_Rplots_defined.pdf | PDF file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
+cluster_definition_3.xlsx              | XLSX file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
+strict_distance.matrix                 | matrix of the strict distance
+evol_distance.matrix                   | matrix of the evolutionary [speciation] distance
+Internal_coevolution_quotient.txt      | text file containing the ICQ analysis
+
 
 
 #### Example usage for generic antiSMASH output:
@@ -208,7 +260,7 @@ mkdir funorder_output
 then from within the antiSMASH output-folder run following command:
 
 ```
-for file in *cluster*.gbk; do echo $file; sh ~/path/to/directory/funorder_v1/funorder_server.sh [Thread number] $file [absolute path to "funorder_output" directory] [database] ; done
+for file in *cluster*.gbk; do echo $file; sh ~/path/to/directory/funorder_XX/funorder_server.sh [Thread number] $file [absolute path to "funorder_output" directory] [database] ; done
 ```
 
 This will perform a FunOrder analysis for each cluster predicted by antiSMASH.
@@ -220,7 +272,7 @@ Run FunOrder from the folder containing the fasta file you want to analyze.
 (cd ~/path/to/your/fasta_files)
 
 ```
-sh ~/path/to/directory/funorder_v1/funorder_server_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
+sh ~/path/to/directory/funorder_XX/funorder_server_fasta_only.sh [Thread number] [fasta file] [absolute path to outputdirectory] [database]
 ```
 
 or if you added the FunOrder/pipeline directory to your $PATH environmental variable.
@@ -239,9 +291,24 @@ The output of FunOrder is saved in /file.fasta.analysis/alignment
 
 #### Output files produced by funorder_fasta_only.sh
 
-File                         | Description
------------------------------|------------
-Rplot.pdf                    | PDF file with the Analyze.R output as described in our publication
-strict_distance.matrix       | matrix of the strict distance
-evol_distance.matrix         | matrix of the evolutionary [speciation] distance
+File                                | Description
+------------------------------------|------------
+FunOrder_Supplementary_Rplots.pdf   | PDF file with the Analyze.R output as described in our publication FunOrder 2
+FunOrder_clustering_Rplots_pred.pdf | PDF file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
+cluster_definition_pred.xlsx        | XLSX file with the Analyze_clustering_pred.R output as described in our publication FunOrder 2
+strict_distance.matrix              | matrix of the strict distance
+evol_distance.matrix                | matrix of the evolutionary [speciation] distance
+Internal_coevolution_quotient.txt   | text file containing the ICQ analysis
+
+if the automatic clustering failed then the outputfiles are
+
+File                                   | Description
+---------------------------------------|------------
+FunOrder_Supplementary_Rplots.pdf      | PDF file with the Analyze.R output as described in our publication FunOrder 2
+FunOrder_clustering_Rplots_defined.pdf | PDF file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
+cluster_definition_3.xlsx              | XLSX file with the Analyze_clustering_defined.R output as described in our publication FunOrder 2
+strict_distance.matrix                 | matrix of the strict distance
+evol_distance.matrix                   | matrix of the evolutionary [speciation] distance
+Internal_coevolution_quotient.txt      | text file containing the ICQ analysis
+
 
diff --git a/funorder_2.0.tar.xz b/funorder_2.0.tar.xz
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:c17dde0c5b2d10e999a36cd27be5db12819b8c05074ea360575c04fb8f4aee94
+size 1014203552

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+version https://git-lfs.github.com/spec/v1`
	`2`	`+oid sha256:c17dde0c5b2d10e999a36cd27be5db12819b8c05074ea360575c04fb8f4aee94`
	`3`	`+size 1014203552`