This project is a machine learning-based system designed to predict the likelihood of anemia in individuals based on clinical and demographic data. It is built as part of a mini project for the Data Science curriculum.
Anemia is a common blood disorder that can lead to fatigue, weakness, and serious health complications if left untreated. Early detection can significantly improve outcomes. This project aims to build and evaluate machine learning models to predict anemia status from available health data.
- Data preprocessing and cleaning
- Exploratory Data Analysis (EDA)
- Training multiple machine learning models
- Evaluation using metrics such as accuracy, precision, recall, F1-score, and AUC
- Model selection based on performance
- Final model ready for further integration or deployment
- Logistic Regression
- Decision Tree
- Random Forest
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
The models were evaluated using the following metrics:
- Accuracy
- Precision
- Recall
- F1-Score
- AUC (Area Under ROC Curve)
After testing multiple models, Random Forest emerged as the best-performing algorithm with:
- Highest Accuracy
- Strong AUC score, indicating robust prediction capabilities