🧠 XAI-SOMs: Explainable AI using Self-Organizing Maps for Healthcare

Visualize, Understand & Validate Patient-Level Insights through Interpretable SOM-based AI Pipelines

Features • Installation • Usage • Documentation • Contributing

🌟 Overview

XAI-SOMs is a modular and interpretable healthcare analytics framework using Self-Organizing Maps (SOMs) combined with Explainable AI techniques. It enables medical professionals and researchers to discover clinically coherent clusters, derive individual patient explanations, and validate the outputs using domain-driven constraints and visualizations.

Built with scalability and real-world deployment in mind, this system supports regulatory reporting, clinical decision dashboards, and interactive data explorations for transparent medical AI.

📚 Dataset - Diabetes Data Set

📎 Source: Diabetes Data Set – Kaggle

The Diabetes Data Set is a structured medical dataset containing information on diabetes diagnoses and related features. It is commonly used for binary classification tasks in healthcare ML systems to predict the presence or absence of diabetes.

📁 Dataset Highlights:

Total Records: 1,015 patient records
Type: Tabular (Structured)
Task: Binary Classification — Diabetes (Positive/Negative)

✅ Why This Dataset?

Rich in biomedical indicators that affect diabetes.
Ideal for unsupervised learning tasks like Self-Organizing Maps (SOMs) to visualize and cluster patient patterns.
Suitable for XAI (Explainable AI) methods for interpreting model decisions in healthcare.
Balanced and clean, making it suitable for both beginners and research projects.

📌 Dataset Link: https://www.kaggle.com/datasets/mathchi/diabetes-data-set

🚀 Features of XAI-SOMs

Self-Organizing Maps for Healthcare
Unsupervised learning to map patients into 2D grid based on medical similarity.
Explainability-Driven Analysis
Cluster-level topic modeling, patient feature contributions, and interpretable visualization.
Regulatory-Grade Reporting
Automatic PDF/JSON summary reports on patient distributions, clusters, and risks.
Medical Coherence Validator
Domain-specific validator to ensure output clusters reflect clinical sense.
Comprehensive Logging & Error Tracing
Structured logs using custom logger test_logger for better debugging and traceability.
Interactive Visualization Tools
Patient journey mapping, heatmaps, and SOM plots for user-driven exploration.
Reproducible & Modular Pipeline
Seamlessly plug and play with different encoders, normalizers, SOM grid sizes, or patient features.

📰 Published Article

🔗 Coming Soon on Medium — Explaining Clinical AI with XAI-SOMs

🛠️ Installation

Step 1: Clone the Repository

git clone https://github.com/priyam-hub/XAI-SOMs.git
cd XAI-SOMs

Step 2: Setup Environment

# Run Environment Setup Script
bash env_setup.sh

# Choose Python or Conda environment creation

# Activate the environment before proceeding

Step 3: Add a `.env` File (Optional for Custom Config)

# Rename '.sample_env' to '.env' if present
# Add environment variables if needed (e.g., data paths, log level)

Step 4: Prepare Your Dataset

Replace the placeholder dataset in data/ with your own.
Ensure the file structure matches expected columns like patient_id, age, diagnosis, etc.

Step 5: Run the Full Pipeline

python main.py

Step 6: Visualize & Validate

Use logs to trace patient journey and cluster mappings.
Open generated visualizations in results/ folder.

🧰 Technology Stack

Python – Core language for the entire pipeline 🔗 Python Docs

Pandas & NumPy – Data wrangling and numeric analysis 🔗 Pandas | NumPy

Scikit-learn – For scaling, clustering, NMF topic modeling 🔗 Scikit-learn

MiniSom – Self-Organizing Map implementation 🔗 MiniSom GitHub

Matplotlib / Seaborn – For plotting and heatmaps 🔗 Matplotlib | Seaborn

Logging – Custom test_logger for consistent monitoring 🔗 Python Logging

📁 Project Structure

XAI-SOMs/
├── .env                            # Env variables (optional)
├── LICENSE                         # MIT License
├── README.md                       # Project documentation
├── env_setup.sh                    # Shell script to setup environment
├── requirements.txt                # Python package dependencies
├── main.py                         # Pipeline runner script
├── logger/
│   └── logger.py                   # Custom logging format setup
├── data/
│   └── healthcare_sample.csv       # Input dataset (replace with real data)
├── results/
│   └── visualizations/             # Plots and charts
│   └── reports/                    # JSON / PDF medical reports
├── src/
│   ├── preprocessor/               # Feature engineering and transformations
│   ├── som_trainer/                # SOM training logic
│   ├── cluster_identifier/         # Cluster mapping and topic modeling
│   ├── validator/                  # Medical coherence validator
│   ├── visualizer/                 # Heatmap, BMU, and patient visualizations
│   ├── report_generator/          # Generate final reports
│   └── utils/                      # Data loaders, helper functions

🔮 Future Work Roadmap

⚙️ Phase 1: Dashboard & API (1–2 Weeks)

Flask-based clinical dashboard for real-time patient cluster analysis
REST API for integration with EMRs or health platforms
Upload custom CSVs, view journeys and recommendations interactively

🧠 Phase 2: Deep Clinical Knowledge (2–3 Weeks)

Add support for ICD codes, medication mapping, and treatment outcome tracking
Ontology-aware explanations using UMLS or SNOMED CT
Incorporate physician feedback into model outputs

🤖 Phase 3: LLMs & Vector Embeddings (3–4 Weeks)

Use BioBERT/ClinicalBERT or SentenceTransformers for richer representations
Store and query using vector databases like FAISS/Qdrant
Natural language explanations via LangChain + LLaMA 3 style agents

📜 License

This project is licensed under the MIT License. See the LICENSE file for full legal terms.

Made by Priyam Pal - AI and Data Science Engineer

🔗 Visit the Repo • ⭐ Star if you find it useful!

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 XAI-SOMs: Explainable AI using Self-Organizing Maps for Healthcare

🌟 Overview

📚 Dataset - Diabetes Data Set

📁 Dataset Highlights:

✅ Why This Dataset?

🚀 Features of XAI-SOMs

📰 Published Article

🛠️ Installation

Step 1: Clone the Repository

Step 2: Setup Environment

Step 3: Add a `.env` File (Optional for Custom Config)

Step 4: Prepare Your Dataset

Step 5: Run the Full Pipeline

Step 6: Visualize & Validate

🧰 Technology Stack

📁 Project Structure

🔮 Future Work Roadmap

⚙️ Phase 1: Dashboard & API (1–2 Weeks)

🧠 Phase 2: Deep Clinical Knowledge (2–3 Weeks)

🤖 Phase 3: LLMs & Vector Embeddings (3–4 Weeks)

📜 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
config		config
data		data
images		images
logger		logger
notebooks		notebooks
results		results
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env_setup.sh		env_setup.sh
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

License

priyam-hub/XAI-SOMs

Folders and files

Latest commit

History

Repository files navigation

🧠 XAI-SOMs: Explainable AI using Self-Organizing Maps for Healthcare

🌟 Overview

📚 Dataset - Diabetes Data Set

📁 Dataset Highlights:

✅ Why This Dataset?

🚀 Features of XAI-SOMs

📰 Published Article

🛠️ Installation

Step 1: Clone the Repository

Step 2: Setup Environment

Step 3: Add a .env File (Optional for Custom Config)

Step 4: Prepare Your Dataset

Step 5: Run the Full Pipeline

Step 6: Visualize & Validate

🧰 Technology Stack

📁 Project Structure

🔮 Future Work Roadmap

⚙️ Phase 1: Dashboard & API (1–2 Weeks)

🧠 Phase 2: Deep Clinical Knowledge (2–3 Weeks)

🤖 Phase 3: LLMs & Vector Embeddings (3–4 Weeks)

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Step 3: Add a `.env` File (Optional for Custom Config)

Packages