You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many multidimensional/multimodality data sets contain continuous features that are co-linear, correlated or have some association between them. The goal of spatial transformations is to find a set of [latent variables](https://en.wikipedia.org/wiki/Latent_and_observable_variables) with minimum data correlation; hence downstream data analysis be simplified. Common data transformation matrices include statistically driven approaches such as [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) (PCA), [explanatory factor analysis](https://en.wikipedia.org/wiki/Exploratory_factor_analysis) (EFA), and [canonical-correlation analysis](https://en.wikipedia.org/wiki/Canonical_correlation) (CCA). An heuristic alternative for these two statistical approaches is the heuristic-multidimensional correlation analysis (HMCA). The main advantage of the heuristic approach is that it is driven by specific requirements for the output generated. The specific requirements are:
9
+
Many multidimensional/multimodality data sets contain continuous features that are co-linear, correlated or have some association between them. The goal of spatial transformations is to find a set of [latent variables](https://en.wikipedia.org/wiki/Latent_and_observable_variables) with minimum data correlation; hence downstream data analysis be simplified. Common data transformation matrices include statistically driven approaches such as [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) (PCA), [explanatory factor analysis](https://en.wikipedia.org/wiki/Exploratory_factor_analysis) (EFA), and [canonical-correlation analysis](https://en.wikipedia.org/wiki/Canonical_correlation) (CCA). An algoritm alternative for these two statistical approaches is the Iterative Decorrelation Analysis (HMCA). The main advantage of the iterative approach is that it is driven by specific output requirements. The specific requirements are:
10
10
11
11
1. All output variables $Q=(q_1,...q_n)$ have a parent input variable $X=(x_1,...x_n)$ (See Fig 1.)
12
12
@@ -48,7 +48,7 @@ library("FRESA.CAD")
48
48
data('iris')
49
49
50
50
## HMCA Decorrelation at 0.25 threshold, pearson and fast estimation
51
-
irisDecor <- GDSTMDecorrelation(iris,thr=0.25)
51
+
irisDecor <- IDeA(iris,thr=0.25)
52
52
53
53
### Print the latent variables
54
54
print(getLatentCoefficients(irisDecor))
@@ -77,9 +77,9 @@ This repository show some examples of the **FRESA.CAD::GDSTMDecorrelation(), FRE
77
77
-**irisexample.R** showcase the effect of the HMCA algorithm on the iris data set.
78
78
79
79
- Here an example of the output
80
-
-
80
+
-
81
81
82
-
-
82
+
-
83
83
84
84
-**ParkisonAnalysis_TrainTest.Rmd** is a demo shows the use of GDSTM and BSWiMS to gain insight of the features associated with a relevant outcome. Highlight process and functions that will aid authors to discern and statistically describe the relevant features associated with an specific outcome.
# Effect of GDSTM-Based Decorrelation on Feature Discovery
24
+
# Effect of UPSTM-Based Decorrelation on Feature Discovery
25
25
26
-
Here I showcase of to use BSWiMS feature selection/modeling function coupled with Goal Driven Sparse Transformation Matrix (GDSTM) as a pre-processing step to decorrelate highly correlated features. The aim(s) are:
26
+
Here I showcase of to use BSWiMS feature selection/modeling function coupled with Goal Driven Sparse Transformation Matrix (UPSTM) as a pre-processing step to decorrelate highly correlated features. The aim(s) are:
27
27
28
28
1. To improve model performance by uncovering the hidden information between correlated features.
29
29
30
30
2. To simplify the interpretation of the machine learning models.
31
31
32
32
This demo will use:
33
33
34
-
-*FRESA.CAD::GDSTMDecorrelation()*. For Decorrelation of Multidimensional data sets
34
+
-*FRESA.CAD::IDeA()*. For Decorrelation of Multidimensional data sets
35
35
36
36
-*FRESA.CAD::getDerivedCoefficients()*. For the extraction of the model of the newly discovered of decorrelated features.
#### Decorrelation: Training and Testing Sets Creation
133
133
134
-
I compute a decorrelated version of the training and testing sets using the *GDSTMDecorrelation()* function of FRESA.CAD. The first decorrelation will be driven by features associated with the outcome. The second decorrelation will find the GDSTM without the outcome restriction.
134
+
I compute a decorrelated version of the training and testing sets using the *IDeA()* function of FRESA.CAD. The first decorrelation will be driven by features associated with the outcome. The second decorrelation will find the UPSTM without the outcome restriction.
# Effect of GDSTM-Based Decorrelation on Feature Discovery: The DARWIN Evaluation
17
+
# Effect of UPSTM-Based Decorrelation on Feature Discovery: The DARWIN Evaluation
18
18
19
-
Here I showcase of to use BSWiMS feature selection/modeling function coupled with Goal Driven Sparse Transformation Matrix (GDSTM) as a pre-processing step to decorrelate highly correlated features. The aim(s) are:
19
+
Here I showcase of to use BSWiMS feature selection/modeling function coupled with Goal Driven Sparse Transformation Matrix (UPSTM) as a pre-processing step to decorrelate highly correlated features. The aim(s) are:
20
20
21
21
1. To improve model performance by uncovering the hidden information between correlated features.
22
22
23
23
2. To simplify the interpretation of the machine learning models.
24
24
25
25
This demo will use:
26
26
27
-
- FRESA.CAD::GDSTMDecorrelation(). For Decorrelation of Multidimensional data sets
27
+
- FRESA.CAD::IDeA(). For Decorrelation of Multidimensional data sets
28
28
29
29
- FRESA.CAD::getDerivedCoefficients(). For the extraction of the model of the newly discovered of decorrelated features.
#### Decorrelation: Training and Testing Sets Creation
143
143
144
-
I compute a decorrelated version of the training and testing sets using the *GDSTMDecorrelation()* function of FRESA.CAD. The first decorrelation will be driven by features associated with the outcome. The second decorrelation will find the GDSTM without the outcome restriction.
144
+
I compute a decorrelated version of the training and testing sets using the *IDeA()* function of FRESA.CAD. The first decorrelation will be driven by features associated with the outcome. The second decorrelation will find the UPSTM without the outcome restriction.
0 commit comments