This project aims to classify fetal health using features extracted from Cardiotocogram (CTG) data. It categorizes fetal health status into three classes: Normal, Suspect, and Pathological. The goal is to enhance the accuracy and reliability of fetal health assessments, providing healthcare professionals with an effective tool for early diagnosis.
- Objectives
- Data Source and Description
- Data Preparation
- Exploratory Data Analysis (EDA)
- Machine Learning Models
- Results
- Usage
- Documentation
- Problem Objective: Monitor fetal health using CTG data.
- Research Questions:
- Can we accurately classify fetal health using CTG data?
- Which machine learning algorithms are most effective for this classification task?
- What are the most important features for predicting fetal health?
- The dataset is publicly available on Kaggle: Fetal Health Classification.
- The dataset contains 2,126 records with features derived from CTG exams, which were then categorized into three classes by expert obstetricians: Normal, Suspect, and Pathological.
- FHR baseline, accelerations, fetal movement, uterine contractions, light decelerations, severe decelerations, prolonged decelerations, abnormal short-term variability, histogram metrics, and others.
- 'fetal_health' - Classified as 1 (Normal), 2 (Suspect), and 3 (Pathological).
- The dataset did not contain missing values. Standardization was applied to ensure consistency in feature scales.
- Applied SMOTE (Synthetic Minority Over-sampling Technique) to address class imbalance, thereby enhancing the performance of models in predicting minority classes.
- Histograms and Box Plots: Used to understand feature distributions and identify significant outliers.
- Correlation Matrix: Visualized using a heatmap to understand feature relationships. Highly correlated features were dropped to avoid redundancy.
- Features like prolonged decelerations and abnormal short-term variability were found to be positively correlated with fetal health issues.
- K-Nearest Neighbors (KNN)
- Gaussian Naive Bayes
- Random Forest
- Gradient Boosting
- Logistic Regression
- Linear Discriminant Analysis (LDA)
- Neural Network (MLP)
- Support Vector Machine (SVM)
- Random Forest: Achieved an accuracy of 94%, excelling across all classes.
- Gradient Boosting: Demonstrated strong performance, particularly in distinguishing between classes.
- Neural Networks: Provided balanced precision and recall, suitable for capturing complex non-linear relationships.
- Classification reports and confusion matrices highlighted the strong performance of Random Forest and Gradient Boosting, especially in identifying "normal" and "pathological" cases.
To run the project:
- Clone the repository:
git clone https://github.com/YOUR_USERNAME/Fetal-Health-Classification.git
- Navigate to the directory:
cd Fetal-Health-Classification
- Install the required dependencies:
pip install -r requirements.txt
- Run the Jupyter notebook for analysis:
jupyter notebook Fetal_Health.ipynb
For a more detailed analysis and discussion, please refer to the summary report: Summary Report (PDF)