OurWorldInData CCA Analysis

This project performs a Canonical Correlation Analysis (CCA) on socio-economic and demographic data from Our World in Data. The primary objective is to investigate the relationships between objectively measurable factors (e.g., meat supply, university enrollment) and subjective, self-reported indicators (e.g., happiness, trust levels) across 22 different countries.

Project Overview

The project aims to uncover hidden correlations between seemingly disparate datasets. By leveraging CCA, we can identify latent variables that maximize the correlation between two sets of variables, allowing us to better understand how objective societal conditions might influence subjective well-being and perceptions.

Dataset Information

The analysis utilizes two primary datasets, data1.csv and data2.csv, downloaded from ourworldindata.org. These files contain statistical survey results, with the latest available data before 2020 for 22 countries.

data1.csv includes:

happiness: Self-reported life satisfaction.
trust_level: Share of people who agree with "most people can be trusted."
chocolate: Per capita consumption of cocoa beans (in kg).

data2.csv includes:

annual_work: Average number of annual work hours.
food_cost: Share of income spent on food.
meat_yearly: Yearly supply of meat per person.
overweight: Share of the adult population that is overweight or obese.
articles_per_million: Number of research articles published in a year per million of population.
create_research: Share of professionals in research and development per million of population.
university_enrolment: Gross enrollment ratio in tertiary education.
electdem: Electoral democracy index.

Additionally, the project involves integrating a third dataset of my choice from ourworldindata.org, selected to be available for all 22 countries for the year 2019, further expanding the scope of the CCA.

Project Structure and Methodology

The project follows a structured approach:

Data Import and Visualization: Both data1.csv and data2.csv are imported into Python. Initial histograms are generated to visualize the distribution of each variable.
Data Preprocessing: All variables undergo necessary standardisation to prepare them for CCA, ensuring that differences in scale do not disproportionately influence the analysis.
Canonical Correlation Analysis (CCA) Implementation:

CCA is implemented and applied to the standardized datasets.
The initial results are interpreted to identify the canonical variates and their correlations, providing insights into the relationships between the objective and subjective variable sets.

Expanded Analysis: A new, relevant dataset is downloaded from ourworldindata.org (e.g., [You'd specify your chosen dataset and explain the rationale for choosing it here in the code or a separate documentation]). This new data is merged with the existing datasets.
Re-run CCA and Interpretation: CCA is re-run with the augmented dataset. The results are then re-interpreted, comparing them to the initial findings and discussing any new insights or changes in correlation patterns.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Canonical_Correlation.ipynb		Canonical_Correlation.ipynb
README.md		README.md
data1.csv		data1.csv
data2.csv		data2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OurWorldInData CCA Analysis

Project Overview

Dataset Information

Project Structure and Methodology

About

Uh oh!

Releases

Packages

Languages

kwadwo-Oppong/canonical-correlation-life-quality

Folders and files

Latest commit

History

Repository files navigation

OurWorldInData CCA Analysis

Project Overview

Dataset Information

Project Structure and Methodology

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages