Skip to content

This repository focuses on classifying cancer types using ML algorithms. It includes data preprocessing, feature selection, model training, and evaluation using Logistic Regression, Decision Trees, Random Forest, SVC, KNN, and Naive Bayes. Explainable AI techniques like SHAP and LIME are implemented for transparency.

Notifications You must be signed in to change notification settings

NEHAKIRANNAYAK/Cancer-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Cancer Classification Using Machine Learning

Overview

This project aims to classify cancer categories using machine learning algorithms. The dataset undergoes preprocessing, feature selection, and model training to achieve accurate predictions.

Dataset

  • The dataset obtained from a medical institution for research work, is loaded from an Excel file.
  • Categorical variables are encoded using Label Encoding.
  • Specific irrelevant or redundant columns are dropped before training.

Algorithms Used

The following machine learning models were implemented and evaluated:

  • Logistic Regression
  • Decision Tree Classifier
  • Random Forest Classifier
  • Support Vector Classifier (SVC)
  • K-Nearest Neighbors (KNN)
  • Naive Bayes (GaussianNB)

Steps Involved

  1. Data Preprocessing: Handling categorical data with encoding.
  2. Feature Selection: Dropping unnecessary columns.
  3. Splitting the Data: Train-test split (70-30 ratio).
  4. Model Training: Training various classification models.
  5. Evaluation: Assessing model performance using appropriate metrics.
  6. Explainable AI: Implementing techniques to interpret model predictions, including feature importance analysis and SHAP (SHapley Additive exPlanations) values to improve transparency.
  7. Hyperparameter Tuning: Optimizing model parameters using Grid Search and Randomized Search.
  8. Deployment Considerations: Potential deployment strategies using Flask or FastAPI for real-world applications.

Dependencies

Ensure you have the following Python libraries installed before running the notebook:

pip install pandas numpy scikit-learn shap

Usage

Run the Jupyter Notebook step by step to preprocess data, train models, and evaluate results.

Explainable AI (XAI)

To ensure model transparency, we use:

  • Feature Importance: Analyzing which features contribute most to model predictions.
  • SHAP Values: A method to explain individual predictions.
  • LIME (Local Interpretable Model-Agnostic Explanations): Generating interpretable explanations for predictions.

Future Enhancements

  • Deep Learning Models: Exploring neural networks for improved accuracy.
  • Automated Machine Learning (AutoML): Implementing AutoML frameworks for automated feature engineering and model selection.
  • Integration with Clinical Data: Enhancing datasets by integrating real-world clinical data for better generalization.

About

This repository focuses on classifying cancer types using ML algorithms. It includes data preprocessing, feature selection, model training, and evaluation using Logistic Regression, Decision Trees, Random Forest, SVC, KNN, and Naive Bayes. Explainable AI techniques like SHAP and LIME are implemented for transparency.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published