This project implements a binary classification model to predict breast cancer diagnosis (malignant vs benign) using neural networks with Keras/TensorFlow.
The project uses the cancer.csv
dataset containing breast cancer diagnostic features:
- Features: 30 numerical features including mean radius, texture, perimeter, area, smoothness, compactness, concavity, etc.
- Target: Binary classification (1 = Malignant, 0 = Benign)
- Size: Multiple samples with comprehensive feature measurements
- Input Layer: 30 features
- Hidden Layers: 2 dense layers with 100 neurons each (ReLU activation)
- Output Layer: 1 neuron with sigmoid activation for binary classification
- Optimizer: Adam
- Loss Function: Binary crossentropy
- Train/test split (75%/25%)
- Feature standardization using StandardScaler
- Random state = 2 for reproducibility
- Epochs: 10
- Batch size: 16
- Validation data: Test set
- Training/validation accuracy curves
- Training/validation loss curves
Week_2.ipynb
: Main notebook with complete implementationdata/cancer.csv
: Breast cancer datasetREADME.md
: Project documentation
Create a requirements.txt
file with:
pandas>=1.3.0
numpy>=1.21.0
matplotlib>=3.4.0
scikit-learn>=1.0.0
tensorflow>=2.8.0
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Create a
data/
folder and placecancer.csv
inside it - Mount Google Drive (if using Colab)
- Load and preprocess the dataset from
data/cancer.csv
- Train the neural network model
- Evaluate performance with accuracy/loss plots
The model achieves the following performance metrics:
- Training Accuracy: ~95-98%
- Validation Accuracy: ~92-96%
- Training Loss: Decreases steadily over epochs
- Validation Loss: Should remain stable without significant overfitting
The model tracks both training and validation metrics to monitor performance and detect potential overfitting patterns.