Skip to content

tirtha103/Breast-Cancer-Classification-with-Logistic-Regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Breast Cancer Classification with Logistic Regression

This project demonstrates a beginner-friendly implementation of Logistic Regression to classify breast cancer tumors as benign or malignant, using the Breast Cancer Wisconsin dataset.


Objectives

  • Preprocess and clean the dataset (remove NaNs, encode labels)
  • Perform Exploratory Data Analysis (EDA) with visual insights
  • Train and evaluate a Logistic Regression classifier
  • Tune decision threshold and explain sigmoid output
  • Save all visuals and results for GitHub display
  • Present the work in a clean, job-ready format

Project Structure

breast-cancer-logistic-regression/ │ ├── data/ # Raw & cleaned dataset ├── images/ # Visuals (histograms, heatmaps, confusion matrix, ROC) ├── notebooks/ │ ├── 01_data_cleaning.ipynb │ ├── 02_eda.ipynb │ └── 03_logistic_model.ipynb ├── requirements.txt # Required libraries ├── LICENSE # MIT License └── README.md # This file


Dataset


Tasks Completed

Task Status
Data Cleaning & Preprocessing
Exploratory Data Analysis (EDA)
Logistic Regression Training
Evaluation (Accuracy, F1, AUC, etc)
Confusion Matrix + ROC Curve
Threshold tuning
All visuals saved in images/

Model Evaluation

Metric Score
Accuracy 93.86%
Precision 0.97
Recall 0.86
F1 Score 0.91
ROC-AUC Score 0.986

Confusion matrix and ROC curve visualizations are saved in the images folder.


Tools & Environment Used

  • Jupyter Notebook – Interactive coding and documentation
  • VS Code – Code editor used for development
  • Git – Version control
  • GitHub – Project hosting and portfolio building
  • Kaggle – Dataset source
  • Markdown – For clean documentation
  • Python 3.10+ – Language used

Libraries Used

  • Python 3
  • Pandas, NumPy
  • Matplotlib, Seaborn
  • scikit-learn

How to Run This Project

# Clone this repository
git clone https://github.com/your-username/breast-cancer-logistic-regression.git
cd breast-cancer-logistic-regression

# Install required libraries
pip install -r requirements.txt

# Run notebooks
jupyter lab