Skip to content

samrato/DataScience_004

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ† World Cup 2018 Squads Data Cleaner

This project contains a Python function to load, clean, and prepare the 2018 FIFA World Cup squads data for use in analysis tools like Tableau.

๐Ÿ“„ File: wrangle_world_cup_squads.py Function: wrangle_world_cup_squads(file_path) Purpose

Preprocess the 2018 World Cup squads dataset to ensure it is clean, standardized, and ready for visualization or analysis.

โœ… Features

๐Ÿ“ฅ Loads data from an Excel file

๐Ÿงน Cleans and standardizes string fields (e.g. team names, player names)

๐Ÿ“† Parses dates and calculates player ages

๐Ÿ”ข Ensures numeric fields like caps and goals are properly typed

๐Ÿงฎ Creates age group categories for better grouping in visual tools

๐Ÿ“ค Exports cleaned data to CSV: world_cup_2018_squads_cleaned.csv

๐Ÿ“Š Output

The output CSV will include the following cleaned and enriched fields:

Type

Team

Group

Position

Name

Country and Club

DOB

Age

Caps

Goals

Age_Group (<20, 20-24, 25-29, 30+)

๐Ÿงช Example Usage from wrangle_world_cup_squads import wrangle_world_cup_squads

Load and clean the data

df_cleaned = wrangle_world_cup_squads("world_cup_2018_squads.xlsx")

โš ๏ธ Notes

Rows with missing or invalid DOB, Caps, or Goals will be removed.

Age is calculated based on the current year, which may slightly differ from the exact tournament age.

Make sure the Excel file contains columns named exactly as expected: Type, Team, Group, Position, Name, Country and Club, DOB, Caps, Goals.

๐Ÿ—‚๏ธ Files File Description wrangle_world_cup_squads.py Python script for cleaning the dataset world_cup_2018_squads.xlsx Input Excel file with raw data (not included) world_cup_2018_squads_cleaned.csv Output CSV file with cleaned data ๐Ÿ“ˆ Suggested Use in Tableau

Once you have the world_cup_2018_squads_cleaned.csv, you can:

Import it into Tableau

Use Team, Group, Position, and Age_Group for filtering/grouping

Visualize age distribution, goals per age group, caps per team, etc.

๐Ÿ“Œ Requirements

Python 3.7+

pandas

openpyxl (required for reading .xlsx files)

Install dependencies:

pip install pandas openpyxl

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published