Skip to content

This project analyzes Olympic athlete data from 2008, 2012, and 2016 Summer Olympics, focusing on identifying and understanding outliers in various sports comparing genders based on athletes' physical attributes via using advanced machine learning techniques

Notifications You must be signed in to change notification settings

alper-sayin/OlympicsAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Olympic Athletes Outlier Analysis (2008-2016)

Description

This project analyzes Olympic athlete data from 2008, 2012, and 2016 Summer Olympics, focusing on identifying and understanding outliers in various sports comparing genders based on athletes' physical attributes. Using advanced machine learning techniques, we explore how different sports deviate from the norm in terms of age, height, weight, and BMI characteristics of their athletes.

Key Features

  1. Data Preprocessing: Cleaned and prepared Olympic athlete data, handling duplicates and ensuring comparable sports across genders.

  2. Exploratory Data Analysis:

    • Visualized athlete counts by sport and gender using interactive Plotly graphs.
    • Created heatmaps to display average age, height, weight, and BMI across sports.
  3. Feature Engineering:

    • Calculated BMI for athletes.
    • Created rank difference features to compare male and female athletes within each sport.
  4. Outlier Detection:

    • Application of machine learning models (DBSCAN and Isolation Forest) for outlier detection
    • Principal Component Analysis (PCA) for dimensionality reduction and visualization

Technologies Used

  • Python
  • Pandas & NumPy for data manipulation
  • Matplotlib, Seaborn, and Plotly for visualization
  • Scikit-learn for machine learning algorithms (DBSCAN, Isolation Forest, PCA)

Key Findings&Results

  • Significant variations in gender differences were observed across different sports for age, height, weight, and BMI.
  • The analysis revealed patterns in how gender differences manifest in various Olympic sports.
  • In many sports, there are physical attribute similarities regardless of gender.
  • Boxing was consistently identified as an outlier sport in terms of comparing genders in multiple athletic metrics.
  • DBSCAN and Isolation Forest models' results are nearly identical.
  • Z score results also support our argument.

Visualizations

Description Total athletes in dataset Description Male athlete averages Description Female athlete averages Description Correlations between different attributes Description Adjusting DBSCAN model parameters for optimization. Silhoutte score is one of the indicators of model success. Description Isolation Forest Sensitivity Analysis Description DBSCAN Clustering Description Isolation Forest Clustering Description Z Score Results

Data Source

DataCamp Olympics Dataset

About

This project analyzes Olympic athlete data from 2008, 2012, and 2016 Summer Olympics, focusing on identifying and understanding outliers in various sports comparing genders based on athletes' physical attributes via using advanced machine learning techniques

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published