Breast-Cancer-Prediction-using-Machine-Learning

This project leverages machine learning algorithms to predict whether a breast tumor is benign or malignant using the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The goal is to compare various models and identify the most accurate and reliable one for diagnosis.

Overview

Breast cancer is a major health challenge, and early detection is crucial for improving patient outcomes. In this project, we implement and evaluate four machine learning algorithms:

Feedforward Neural Network (FNN)
Support Vector Machine (SVM)
Extreme Gradient Boosting (XGBoost)
Logistic Regression (LR)

Each model's performance is evaluated using key metrics: accuracy, precision, recall, sensitivity, specificity, and AUC (Area Under the Curve).

Dataset

Name: Wisconsin Diagnostic Breast Cancer (WDBC)
Source: UCI Machine Learning Repository

Instances: 569 samples
Features: 30 numeric tumor characteristics (e.g., radius, perimeter, concavity)
Target Variable:
- Benign (B) → 0
- Malignant (M) → 1

The dataset is split into 80% training and 20% testing for model evaluation.

Models and Techniques

Feedforward Neural Network (FNN):
Captures non-linear patterns through hidden layers and backpropagation.
Support Vector Machine (SVM):
Trains on high-dimensional data with various kernels (linear, RBF).
XGBoost:
A fast, efficient gradient boosting algorithm ideal for structured data.
Logistic Regression (LR):
Serves as a baseline model with straightforward interpretability.

Setup Instructions

Install Required Libraries:

pip install numpy pandas scikit-learn matplotlib seaborn tensorflow keras xgboost

Clone or Download the Repository:

git clone <your-repository-url>
cd <your-repository-folder>

Launch Jupyter Notebook:

jupyter notebook breast_cancer_final.ipynb

How to Run the Models

Preprocess the Data:
- Missing values are handled.
- The target labels are mapped: 'M' → 1, 'B' → 0.
- Features are scaled using StandardScaler.

Train and Evaluate the Models: Execute the cells to train each model:

evaluate_model(fnn_model, "Feedforward Neural Network", X_train, y_train, X_test, y_test)
evaluate_model(best_svm_model, "SVM", X_train, y_train, X_test, y_test)
evaluate_model(best_xgb_model, "XGBoost", X_train, y_train, X_test, y_test)
evaluate_model(logistic_model, "Logistic Regression", X_train, y_train, X_test, y_test)

Performance Metrics: Each model’s accuracy, precision, recall, F1-score, sensitivity, specificity, and AUC will be printed for both the training and testing datasets.

Results Summary

Model	Accuracy (Test)	Precision	Recall	AUC
Feedforward Neural Network	98.25%	100.00%	95.34%	99.14%
Support Vector Machine (SVM)	98.25%	100.00%	95.34%	97.67%
XGBoost	97.36%	97.61%	95.34%	96.97%
Logistic Regression	97.36%	97.61%	95.34%	96.97%

FNN and SVM achieved the highest test accuracy (98.25%), making them excellent candidates for breast cancer prediction.
XGBoost and Logistic Regression also showed strong performance, proving the value of both ensemble models and simple linear classifiers.

Future Improvements

Hyperparameter tuning: Further refinement to optimize model performance.
Advanced Ensemble Models: Explore stacking multiple algorithms.
Model Explainability: Integrate SHAP or LIME for better interpretability.
Deployment: Deploy the models in a web-based interface for real-time diagnosis.
Scalability: Apply the models to larger, more complex datasets to test generalizability.

Contributors

Sanika Mulik
Nikita Kedari
Juee Shinde

This project was made for TY BTech CSE: Artificial Intelligence and Expert Systems, in Semester V, July-Dec 2024

Supervisor: Prof. Pramod Mali
Institution: MIT World Peace University, Pune

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
datapipeline		datapipeline
model		model
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Breast-Cancer-Prediction-using-Machine-Learning

Overview

Dataset

Models and Techniques

Setup Instructions

How to Run the Models

Results Summary

Future Improvements

Contributors

About

Uh oh!

Releases

Packages

Languages

Nikita-Kedari/Breast-Cancer-Prediction-using-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Breast-Cancer-Prediction-using-Machine-Learning

Overview

Dataset

Models and Techniques

Setup Instructions

How to Run the Models

Results Summary

Future Improvements

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages