- About
- Motivation
- Features
- Datasets
- Machine Learning Algorithms
- Evaluation Metrics
- Usage
- Results
- Contributing
- License
This repository contains the source code and resources for my software defect prediction project. The project aims to explore the application of machine learning algorithms for early identification of software defects, enhancing software quality and reducing maintenance costs.
Software defects can have significant implications on software quality, user satisfaction, and maintenance efforts. This project was motivated by the desire to leverage machine learning techniques to predict defects more accurately, aiding software developers in proactively addressing potential issues.
- Comparative analysis of 10 machine learning algorithms
- Addressing class imbalance using Random Over-Sampling (ROS)
- Evaluation using metrics such as accuracy, precision, recall, AUC-ROC
- Cross-validation with 10 splits using RepeatedStratifiedKFold
The project utilizes 7 publicly available datasets:
- CM1
- JM1
- KC1
- KC2
- MC1
- MW1
- PC1
The following algorithms are evaluated in this project:
- Logistic Regression
- XGBoost
- AdaBoost
- Voting Classifier
- Random Forest
- Decision Tree
- SVM
- Gradient Boosting
- KNN
- Bagging Classifier
The project employs various evaluation metrics:
- Prediction Accuracy
- Precision
- Recall
- AUC-ROC
- Clone the repository:
git clone https://github.com/yourusername/software-defect-prediction.git
- Install Dependencies
pip install -r requirements.txt
- Open each folder to access the main.ipynb fiels