The primary objective of this project is to classify breast tumour datasets into two classes (benign or malignant) using various machine learning models, including Logistic Regression, Support Vector Machine (SVM), and Neural Network.
-
Classifying Breast Tumour Dataset: The project aims to leverage machine learning models such as Logistic Regression, SVM, and Neural Network to accurately classify biopsy reports into benign or malignant categories. The dataset comprises 569 biopsy reports, with 357 classified as benign and 212 as malignant. The 'diagnosis' column includes the report outcomes, with 'M' indicating Malignant and 'B' indicating Benign cases.
-
Model Accuracy Evaluation: The project's secondary goal is to evaluate the accuracy of these machine learning models in correctly classifying the biopsy reports. The evaluation focuses on assessing the model performances concerning accuracy metrics.
The provided dataset consists of fine needle aspiration biopsy reports, aiding in diagnosing abnormal tissue or fluid conditions, such as cancer. Key features in the dataset include 10 real-valued parameters computed for each cell nucleus, encompassing attributes like radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension. These features were computed based on the mean, standard error, and the "worst" or largest values, resulting in a total of 30 features for each image.
To execute the project:
- Ensure that the dataset file "breast cancer.csv" resides in the same directory as the 21CS10052_Code.ipynb.
- Run all the cells within the Jupyter Notebook in sequential order.
A sample output demonstrating the project's results is available in the file Output.pdf for reference.