This project analyzes the World Happiness Report 2023 dataset to understand factors contributing to happiness across different countries.
├── config/ # Configuration files
├── data/ # Data directory
│ ├── raw/ # Raw data files
│ └── processed/ # Processed data files
├── notebooks/ # Jupyter notebooks
├── src/ # Source code
│ ├── data/ # Data processing scripts
│ ├── features/ # Feature engineering
│ ├── models/ # Model training and evaluation
│ └── visualization/ # Visualization scripts
└── results/ # Output results and visualizations
├── models/ # Trained models
└── visualizations/ # Generated visualizations
- Python 3.8 or higher
- pip (Python package installer)
- Clone the repository:
git clone <repository-url>
cd world-happiness
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
To run the entire pipeline (download data, preprocess, visualize, and train model):
python src/main.py
- Download the dataset:
python src/data/download_data.py
- Preprocess the data:
python src/data/preprocess.py
- Create visualizations:
python src/visualization/create_visualizations.py
- Train the model:
python src/models/train.py
For interactive analysis:
jupyter notebook notebooks/01_exploratory_data_analysis.ipynb
- Data cleaning and validation
- Missing value handling
- Feature scaling
- Train-test split
- Statistical summaries
- Distribution analysis
- Correlation analysis
- Feature importance visualization
- Feature scaling
- Feature selection
- Feature importance analysis
- Random Forest model
- Hyperparameter optimization using Optuna
- Model evaluation metrics
- Feature importance analysis
- Happiness score distribution
- Correlation heatmap
- Top/bottom countries analysis
- Regional analysis
- Feature importance plots
The project generates various outputs in the results/
directory:
results/models/
: Trained models and feature importanceresults/visualizations/
: Generated plots and interactive visualizationspipeline.log
: Detailed logging of the pipeline execution
-
Data Download Issues
- Check your internet connection
- Verify the URL in
src/data/download_data.py
- Ensure write permissions in the data directory
-
Preprocessing Errors
- Verify the raw data file exists in
data/raw/
- Check for correct column names in the configuration
- Ensure sufficient disk space
- Verify the raw data file exists in
-
Model Training Issues
- Verify processed data exists
- Check for memory constraints
- Adjust model parameters in
config/config.yaml
-
Visualization Problems
- Ensure all required packages are installed
- Check for write permissions in the results directory
- Verify data format and column names
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.