Skip to content

Logistic regression model for breast cancer prediction using imaging features, achieving 98.24% accuracy on Kaggle dataset without vectorization.

Notifications You must be signed in to change notification settings

debjeetchanda/Breast-Cancer-Detection-

Repository files navigation

Breast Cancer Detection

Breast cancer is the most prevalent cancer among women globally, accounting for nearly 25% of all cancer cases. In 2015 alone, it affected over 2.1 million individuals. The disease originates when cells in the breast begin to grow uncontrollably, forming tumors that can be detected via X-ray or felt as lumps.

The dataset used in this project is sourced from Kaggle: Breast Cancer Dataset

The dataset contains 569 rows and 32 columns, of which 30 features are used to train the model. These features represent statistical measurements of a breast tumor’s shape, size, texture, and boundary characteristics—captured through imaging—to help classify it as benign or malignant.

Model and Training Approach

The model is trained using Logistic Regression and optimized using Gradient Descent. A cost function is computed to evaluate how well the model predicts the classification, and this value is minimized iteratively.

A plot of the cost function against the number of iterations shows convergence around 1000 iterations, indicating proper learning during training.

Performance and Optimization

  • Initial version: Implemented without vectorization (see breastcancerpredictionwithoutvectorisation.ipynb)
  • Training time: 446 seconds
  • Model accuracy: 98.24%
  • Improvement suggestions: Include cross-validation, more data, or quadratic features to improve accuracy.

After Vectorization

  • Optimized version: Vectorized implementation (see breastcancerprediction.ipynb)
  • Training time: 2 seconds
  • Result: Identical accuracy, drastically reduced computation time
  • This highlights the power of using NumPy-based vectorized operations for scalable model training.

Both notebooks are provided for educational comparison — demonstrating how vectorization can lead to massive improvements in performance without changing model logic.


Made with ❤️ for machine learning and performance optimization.

About

Logistic regression model for breast cancer prediction using imaging features, achieving 98.24% accuracy on Kaggle dataset without vectorization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published