This project utilizes the Logistic Regression algorithm to classify cancer cells as either benign or malignant. With an impressive accuracy rate of 96.50%, this model can be used for medical diagnosis and research purposes.
The project uses the Cancer_Data.csv dataset. This dataset contains 569 entries with 30 features, and each entry is labeled as benign (B) or malignant (M).
The project is organized into the following sections:
-
Library and Input File: This section imports necessary libraries and loads the dataset.
-
Data Loading and Editing: The dataset is loaded, and unnecessary columns (such as 'Unnamed: 32' and 'id') are removed. The 'diagnosis' column is also converted to numerical values (1 for 'M' and 0 for 'B').
-
Normalization: Data normalization is performed to scale the values between 0 and 1, preventing high or low values from introducing errors in the model.
-
Train Test Split: The dataset is divided into training and testing sets for model training and evaluation.
-
Initialize Weights and Bias: Weight and bias parameters are initialized for logistic regression.
-
Sigmoid Function: The sigmoid function is implemented to convert predictions to values between 0 and 1.
-
Forward-Backward Propagation: This section covers the forward and backward propagation steps in logistic regression, calculating the cost and gradients.
-
Updating Parameters: The parameters (weights and bias) are updated using the calculated gradients and learning rate.
-
Prediction: Predictions are made based on the trained model.
-
Logistic Regression Algorithm: The logistic regression algorithm is executed, and the results are evaluated, including a confusion matrix.
-
Model Result: The training progress is monitored, and the model's performance is assessed.
To run the project, make sure you have the following Python libraries installed:
- NumPy
- pandas
- scikit-learn
- seaborn
- matplotlib
You can install these libraries using pip:
pip install numpy
pip install pandas
pip install scikit-learn
pip install seaborn
pip install matplotlib
- Clone the project repository:
git clone https://github.com/Prometheussx/Kaggle-Notebook-Cancer-Prediction-ACC96.5-With-Logistic-Regression.git
cd Cancer_Data_Classification
- Ensure you have Python and the required libraries installed.
-
Download the Cancer_Data.csv dataset and place it in the project directory.
-
Follow the code in the "Data Loading and Editing" section to load and preprocess the dataset.
Execute the Python code in the repository files to perform logistic regression and train the model.
Example: logistic_regression(x_train, y_train, x_test, y_test, learning_rate=1, num_iterations=300)
The model's performance is evaluated with metrics such as accuracy and a confusion matrix.
The project attains an accuracy of 96.50% in classifying cancer cells. Training progress and results are visualized in the README.
This project is released under the MIT License.
- Mail Adress: Erdem Taha Sokullu
- Linkedln Profile: Erdem Taha Sokullu
- Github Profile: Prometheussx
- Kaggle Profile@erdemtaha
Feel free to reach out if you have any questions or need further information about the project.