Skip to content

pareek-ml/world-happiness-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

World Happiness Report 2023 Analysis

This project analyzes the World Happiness Report 2023 dataset to understand factors contributing to happiness across different countries.

Project Structure

├── config/                 # Configuration files
├── data/                   # Data directory
│   ├── raw/               # Raw data files
│   └── processed/         # Processed data files
├── notebooks/             # Jupyter notebooks
├── src/                   # Source code
│   ├── data/             # Data processing scripts
│   ├── features/         # Feature engineering
│   ├── models/           # Model training and evaluation
│   └── visualization/    # Visualization scripts
└── results/              # Output results and visualizations
    ├── models/           # Trained models
    └── visualizations/   # Generated visualizations

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)

Setup

  1. Clone the repository:
git clone <repository-url>
cd world-happiness
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

Running the Complete Pipeline

To run the entire pipeline (download data, preprocess, visualize, and train model):

python src/main.py

Running Individual Components

  1. Download the dataset:
python src/data/download_data.py
  1. Preprocess the data:
python src/data/preprocess.py
  1. Create visualizations:
python src/visualization/create_visualizations.py
  1. Train the model:
python src/models/train.py

Jupyter Notebook Analysis

For interactive analysis:

jupyter notebook notebooks/01_exploratory_data_analysis.ipynb

Project Components

Data Preprocessing

  • Data cleaning and validation
  • Missing value handling
  • Feature scaling
  • Train-test split

Exploratory Data Analysis (EDA)

  • Statistical summaries
  • Distribution analysis
  • Correlation analysis
  • Feature importance visualization

Feature Engineering

  • Feature scaling
  • Feature selection
  • Feature importance analysis

Predictive Modeling

  • Random Forest model
  • Hyperparameter optimization using Optuna
  • Model evaluation metrics
  • Feature importance analysis

Visualization

  • Happiness score distribution
  • Correlation heatmap
  • Top/bottom countries analysis
  • Regional analysis
  • Feature importance plots

Results

The project generates various outputs in the results/ directory:

  • results/models/: Trained models and feature importance
  • results/visualizations/: Generated plots and interactive visualizations
  • pipeline.log: Detailed logging of the pipeline execution

Troubleshooting

  1. Data Download Issues

    • Check your internet connection
    • Verify the URL in src/data/download_data.py
    • Ensure write permissions in the data directory
  2. Preprocessing Errors

    • Verify the raw data file exists in data/raw/
    • Check for correct column names in the configuration
    • Ensure sufficient disk space
  3. Model Training Issues

    • Verify processed data exists
    • Check for memory constraints
    • Adjust model parameters in config/config.yaml
  4. Visualization Problems

    • Ensure all required packages are installed
    • Check for write permissions in the results directory
    • Verify data format and column names

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Data analysis and modelling of World happiness Report

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published