Genomic Exploration of HPV-Associated Head Neck Squamous Cell Carcinoma Occurrence in Bangladesh: An Integrative Histopathological Analysis and Molecular Profiling of HPV
Status: Ongoing research. Primary datasets are withheld until publication.
This repository documents an integrative study combining clinical metadata, histopathology, HPV molecular typing, and machine‑learning to investigate HPV’s role in Head & Neck Squamous Cell Carcinoma (HNSCC) in Bangladesh.
Highlights
- Clinical cohort curation and histopathological review
- Genomic exploration of HPV‑positive cases (focus on high‑risk types)
- IHC markers and protein expression profiling
- ML models for risk stratification and feature importance
Note: Only scripts, notebooks, and representative figures are shared. Raw/CSV data are excluded pre‑publication.
Representative documentation images. Replace/update with finalized panels as the project evolves.
Addictions_UpSet_Plot_Positive_Samples.png — overlap of addictions/risk factors (e.g., Smoking, Alcohol, Betel quid, Smokeless tobacco) among cancer‑positive samples.
HPV_Type_Distribution.png — distribution of HPV types among HPV‑positive samples.
SV_thesis_data_clean.ipynb→ Data import, cleaning, preprocessingSV_thesis_stat.ipynb→ Cohort summaries, hypothesis tests, publication‑ready plots
Cells that require restricted data are clearly marked. You can wire them to local paths once data access is granted.
We benchmark classifiers on the clinical cohort using PyCaret with:
- Stratified cross-validation
- Hyperparameter tuning
- Automatic plots for AUROC and Feature Importance
Other outputs:
ML/logs.log— session information & cross-validation results
Linux / macOS
python -m venv .venv
source .venv/bin/activate
Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1
2) Install dependencies
pip install "pycaret[classification]" pandas numpy matplotlib scipy statsmodels upsetplot jupyter
3) Launch notebooks
jupyter notebook Code/






