This project presents an analysis of a synthetic dataset containing sleep, cardiovascular metrics, and lifestyle factors for nearly 400 fictive individuals. The data explores various aspects such as sleep patterns, physical activity, stress levels, and their impact on sleep health.
The dataset is provided in a CSV file, data.csv, containing columns like Person ID, Gender, Age, Occupation, and various health and lifestyle metrics.
The dataset is sourced from Kaggle: Sleep Health and Lifestyle Dataset (https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset/).
- Objective: Develop a classifier to predict sleep disorders based on lifestyle and health metrics.
- Background: For a health insurance company aiming to determine client premiums based on potential sleep disorders.
The EDA includes visualizations and analyses like:
- Relationship between occupation and sleep patterns.
- Age, sleep duration, and sleep disorders.
- Sleep quality across different sleep disorders.
- Gender distribution in sleep disorders.
- Impact of physical activity on sleep health.
- Standardization and encoding of variables.
- Reduction of multicollinearity.
- Preparing the dataset for predictive modeling.
- Baseline model establishment.
- Hyperparameter tuning using Grid Search.
- Optimized Decision Tree Classifier.
- Confusion Matrix Analysis for performance interpretation.
The project concludes with insights into model performance and its application in medical decision-making.
- Clone the repository.
- Install required dependencies.
- Run Jupyter Notebooks to explore the dataset and models.