Skip to content

R scripts for RF and RBF-SVM for Acute Myeloid Leukemia subtypes multiclass classification using gene expression profiles. LASSO feature selection, SMOTE sampling, 10-fold cross-validation, variable importance plot, PCA plot, normalized Confusion Matrix, GSE13159.

License

Notifications You must be signed in to change notification settings

lnv-louis/AML-Subtypes-Classification-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

This repository provides an R-based machine learning pipeline for classifying Acute Myeloid Leukemia (AML) subtypes using gene expression data from the GSE13159 dataset (Affymetrix HG-U133 Plus 2.0 platform). The study applies Random Forest (RF) and Support Vector Machines (SVM) with feature selection, class balancing, and hyperparameter tuning to improve classification accuracy.

The repository includes an R script that automates data preprocessing, model training, and evaluation, requiring RStudio (version 2024.12.0-467) and several key R libraries (GEOquery, limma, randomForest, caret, glmnet, ggplot2, pheatmap, e1071, smotefamily, AnnotationDbi, and reshape2). Running the script will execute the full AML classification pipeline, generating biomarker selection outputs, classification accuracy results, and visualizations of feature importance and model performance. Users interested in bioinformatics, computational biology, and cancer classification can adapt this pipeline for further research.

This project is released under the MIT License, allowing free use, modification, and distribution. Researchers and data scientists are encouraged to contribute improvements, explore additional datasets, and integrate alternative machine learning models to enhance AML classification accuracy. For full reproducibility, the dataset can be accessed via the GEO database (GSE13159), and all preprocessing steps are documented within the code. Further inquiries or contributions can be directed to the repository owner at lelouis.lnv@gmail.com.

About

R scripts for RF and RBF-SVM for Acute Myeloid Leukemia subtypes multiclass classification using gene expression profiles. LASSO feature selection, SMOTE sampling, 10-fold cross-validation, variable importance plot, PCA plot, normalized Confusion Matrix, GSE13159.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages