Breast Cancer Detection

Breast cancer is the most prevalent cancer among women globally, accounting for nearly 25% of all cancer cases. In 2015 alone, it affected over 2.1 million individuals. The disease originates when cells in the breast begin to grow uncontrollably, forming tumors that can be detected via X-ray or felt as lumps.

The dataset used in this project is sourced from Kaggle: Breast Cancer Dataset

The dataset contains 569 rows and 32 columns, of which 30 features are used to train the model. These features represent statistical measurements of a breast tumor’s shape, size, texture, and boundary characteristics—captured through imaging—to help classify it as benign or malignant.

Model and Training Approach

The model is trained using Logistic Regression and optimized using Gradient Descent. A cost function is computed to evaluate how well the model predicts the classification, and this value is minimized iteratively.

A plot of the cost function against the number of iterations shows convergence around 1000 iterations, indicating proper learning during training.

Performance and Optimization

Initial version: Implemented without vectorization (see breastcancerpredictionwithoutvectorisation.ipynb)
Training time: 446 seconds
Model accuracy: 98.24%
Improvement suggestions: Include cross-validation, more data, or quadratic features to improve accuracy.

After Vectorization

Optimized version: Vectorized implementation (see breastcancerprediction.ipynb)
Training time: 2 seconds
Result: Identical accuracy, drastically reduced computation time
This highlights the power of using NumPy-based vectorized operations for scalable model training.

Both notebooks are provided for educational comparison — demonstrating how vectorization can lead to massive improvements in performance without changing model logic.

Made with ❤️ for machine learning and performance optimization.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
README.md		README.md
breast-cancer.csv		breast-cancer.csv
breastcancerprediction.ipynb		breastcancerprediction.ipynb
breastcancerpredictionwithoutvectorisation.ipynb		breastcancerpredictionwithoutvectorisation.ipynb
learning curve.png		learning curve.png
normalised data.csv		normalised data.csv
notebook print.pdf		notebook print.pdf
time elapsed.png		time elapsed.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Breast Cancer Detection

Model and Training Approach

Performance and Optimization

After Vectorization

About

Uh oh!

Releases

Packages

Languages

debjeetchanda/Breast-Cancer-Detection-

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Detection

Model and Training Approach

Performance and Optimization

After Vectorization

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages