Skip to content

knightlab-analyses/stability-analyses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stability-analyses

This set of codes are used for reproducing all the simulation studies and experimental microbiome data applications in Stability manuscript.

I. General code:

code_method folder: contain codes to reproduce simulation results for continuous outcomes

getStability.R: function to calculate Stability Index

cv_method.R: code for 4 selected feature selection methods with user-defined parameter grids and cross-validations for parameter tuning when applied to continuous outcomes

cv_method_binary_update.R: code for 4 selected feature selection methods with user-defined parameter grids and cross-validations for parameter tuning when applied to binary outcomes

stab_data_applications.R: function to perform hypothesis testing using bootstrap for continuous outcomes

stab_data_applications_binary.R: function to perform hypothesis testing using bootstrap for binary outcomes

bootstrap_test_compLasso_rf.R: general functions for comparing feature selection methods using hypothesis testing based on bootstrap when applied to continuous outcomes

bootstrap_test_compLasso_rf_binary.R: general functions for comparing feature selection methods using hypothesis testing based on bootstrap when applied to binary outcomes

source code for compositional lasso (continuous outcome) is available at: https://www.math.pku.edu.cn/teachers/linw/software.html
source code for compositional lasso (binary outcome) is available at: https://github.com/UVic-omics/Microbiome-Variable-Selection

II. Simulation part (within simulations folder):

sim_data_generation folder: contain codes to generate simulated data

sim_dat_ind_toeplitz: code to generate simulated data with Independent and Toeplitz correlation designs
sim_dat_block.R: code to generate simulated data with Block correlation design
run_sim_data.sh: bash commands for running simulation data generation code on HPC

code_sim_cts folder: contain codes to reproduce simulation results for continuous outcomes

cv_sim_apply.R: general functions for applying selected feature selection methods to simulated data when applied to continuous outcomes

1. compute Stability and MSE for different simulation scenarios
ind_results.R: code for comparing 3 methods (lasso, elastic net, random forests) in simulated data with Independent design and continuous outcomes
toe_results.R: code for comparing 3 methods (lasso, elastic net, random forests) in simulated data with Toeplitz design and continuous outcomes
block_results.R: code for comparing 3 methods (lasso, elastic net, random forests) in simulated data with Block design and continuous outcomes
CL_sim_apply.R: code for obtaining results for compositional lasso in all simulation correlation designs with continuous outcomes

2. hypothesis testing with bootstrap for selected simulation scenarios
boot_CL_testing.R: code for calculating bootstrapped confidence interval for compositional lasso method in simulated data with continous outcomes
boot_RF_testing.R: code for calculating bootstrapped confidence interval for random forests method in simulated data with continous outcomes

3. bash commands
run_sim_cts.sh: bash commands for running simulation code for continous outcomes on HPC

code_sim_bin folder: contain codes to reproduce simulation results for binary outcomes

cv_sim_apply_binary_update.R: general functions for applying selected feature selection methods to simulated data when applied to binary outcomes

1. compute Stability and AUC for different simulation scenarios
ind_results_binary_update.R: code for comparing all 4 methods in simulated data with Independent design and binary outcomes
toe_results_binary_update.R: code for comparing all 4 methods in simulated data with Toeplitz design and binary outcomes
block_results_binary_update.R: code for comparing all 4 methods in simulated data with Block design and binary outcomes

2. hypothesis testing with bootstrap for selected simulation scenarios
boot_sim_binary.R: code for calculating bootstrapped confidence interval for compositional lasso and random forests methods in simulated data with binary outcomes

3. bash commands
run_sim_bin.sh: bash commands for running simulation code for binary outcomes on HPC

notebooks_sim_cts folder: contain notebooks (R) to summarize simulation results for continuous outcome

notebooks_sim_bin folder: contain notebooks (R) to summarize simulation results for binary outcome

results_summary_cts folder: contain outputs of tables from notebooks in notebooks_sim_cts folder

results_summary_bin folder: contain outputs of tables from notebooks in notebooks_sim_bin folder

figures_combined folder: contain figures generated for both continous and binary outcomes based on notebook 6_make_figures_combined in notebooks_sim_bin folder

III. application part (within data_application folder):

code_cts folder: contain code for real data applications to BMI & soil datasets for continuous outcomes

code_bin folder: contain code for real data applications to BMI & soil datasets for binary outcomes

notebooks_applications folder: contain notebooks (R) to summarize microbiome application results for continuous and binary outcomes

88soils folder: contain data and application results for soil datast

BMI folder: contain data and application results for BMI datast

About

code for simulation studies and experimental microbiome data applications in Stability manuscript

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •