Identification of Potential Biomarkers for 2022 Mpox Virus Infection: A Transcriptomic Network Analysis and Machine Learning Approach
![]() |
Debnath, J.P., Hossen, K. et al. Identification of potential biomarkers for 2022 Mpox virus infection: a transcriptomic network analysis and machine learning approach. Scientific Reports, 15, 2922 (2025). https://doi.org/10.1038/s41598-024-80519-7 |
This project focuses on analyzing microarray and RNA-Seq data to identify differentially expressed genes (DEGs) and validate them using machine learning techniques. The workflow includes data extraction, normalization, DEG identification, and machine learning validation, supported by various visualizations like volcano plots, PCA, t-SNE, and ROC curves.
- Data was retrieved from the NCBI GEO repository.
- Relevant samples and conditions were selected based on predefined criteria.
- Normalization: Performed to ensure comparability across samples.
- Outlier Removal: Removed to improve data quality.
- Microarray Data:
- Used the
limmapackage for DEG analysis.
- Used the
- RNA-Seq Data:
- Performed using the
DESeq2package.
- Performed using the
- Log Fold Change (LFC):
- Calculated for DEGs and visualized using volcano plots.
- Feature Selection:
- DEGs were further filtered using
PyCaret, selecting the top 10 features (genes). - Visualized using PCA and t-SNE plots.
- DEGs were further filtered using
- Model Evaluation:
- Evaluated using ROC curve analysis to validate predictive performance.
- Biomarker Discovery:
- Identified six key biomarkers associated with the 2022 MPXV infection, validated through machine learning models.
- Run Python and R Scripts for Machine Learning:
- Jupyter Notebook (
1_Pycrate.ipynb):- Open and run the cells in a Jupyter environment.
- R Scripts for PCA, t-SNE, and ROC:
Rscript 2_tSNE.R Rscript 3_PCA.R Rscript 4_pROC.R
- Jupyter Notebook (
- Run R Scripts for Microarray Data Analysis:
- Required Packages:
affy,limma,GEOquery,ggplot2 - Run:
Rscript 1_Expression_MicroArray.R Rscript 2_UmapPlot_MicroArray.R Rscript 3_TopTable_MicroArray.R Rscript 5_DEGs_Identification_MicroArray.R
- Required Packages:
- Run R Scripts for RNA-Seq Data Analysis:
- Required Packages:
DESeq2,ggplot2,data.table,tidyverse - Run:
Rscript 1_install_packages.R Rscript 2_load_data.R Rscript 3_PCA.R Rscript 4_DGE_Normalization.R Rscript 5_DGE_LFC.R Rscript 6_volcano.R
- Required Packages:
limmav3.54.2DESeq2v1.38.3VennDiagramv1.7.3GOplotv1.0.2corrplotv0.92TBtools-IIv2.097Rtsnev0.17pROCv1.18.5randomForestv4.7.1.1ggplot2v3.5.1
PyCaretv3.3.2Pandasv2.1.4SciPyv1.11.4Joblibv1.3.2Scikit-Learnv1.4.2Sktimev0.26.0Pmdarimav2.0.4XGBoostLightGBM
For any questions or issues, please contact:

