Breast Cancer Classification with Logistic Regression

This project demonstrates a beginner-friendly implementation of Logistic Regression to classify breast cancer tumors as benign or malignant, using the Breast Cancer Wisconsin dataset.

Objectives

Preprocess and clean the dataset (remove NaNs, encode labels)
Perform Exploratory Data Analysis (EDA) with visual insights
Train and evaluate a Logistic Regression classifier
Tune decision threshold and explain sigmoid output
Save all visuals and results for GitHub display
Present the work in a clean, job-ready format

Project Structure

breast-cancer-logistic-regression/ │ ├── data/ # Raw & cleaned dataset ├── images/ # Visuals (histograms, heatmaps, confusion matrix, ROC) ├── notebooks/ │ ├── 01_data_cleaning.ipynb │ ├── 02_eda.ipynb │ └── 03_logistic_model.ipynb ├── requirements.txt # Required libraries ├── LICENSE # MIT License └── README.md # This file

Dataset

Source: Kaggle – Breast Cancer Wisconsin Dataset
Shape: 569 samples, 32 features
Target: diagnosis
- 0 = Benign
- 1 = Malignant

Tasks Completed

Task	Status
Data Cleaning & Preprocessing	✅
Exploratory Data Analysis (EDA)	✅
Logistic Regression Training	✅
Evaluation (Accuracy, F1, AUC, etc)	✅
Confusion Matrix + ROC Curve	✅
Threshold tuning	✅
All visuals saved in `images/`	✅

Model Evaluation

Metric	Score
Accuracy	93.86%
Precision	0.97
Recall	0.86
F1 Score	0.91
ROC-AUC Score	0.986

Confusion matrix and ROC curve visualizations are saved in the images folder.

Tools & Environment Used

Jupyter Notebook – Interactive coding and documentation
VS Code – Code editor used for development
Git – Version control
GitHub – Project hosting and portfolio building
Kaggle – Dataset source
Markdown – For clean documentation
Python 3.10+ – Language used

Libraries Used

Python 3
Pandas, NumPy
Matplotlib, Seaborn
scikit-learn

How to Run This Project

# Clone this repository
git clone https://github.com/your-username/breast-cancer-logistic-regression.git
cd breast-cancer-logistic-regression

# Install required libraries
pip install -r requirements.txt

# Run notebooks
jupyter lab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Breast Cancer Classification with Logistic Regression

Objectives

Project Structure

Dataset

Tasks Completed

Model Evaluation

Tools & Environment Used

Libraries Used

How to Run This Project

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
images		images
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

tirtha103/Breast-Cancer-Classification-with-Logistic-Regression

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Classification with Logistic Regression

Objectives

Project Structure

Dataset

Tasks Completed

Model Evaluation

Tools & Environment Used

Libraries Used

How to Run This Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages