Skip to content

augusto-rosa/mysql-data-cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧹 Layoffs Dataset – Data Cleaning with MySQL

This project performs a complete data cleaning process using MySQL on a dataset of global layoffs available on Kaggle - layoffs. The goal is to transform the raw dataset into a clean, standardized, and analytics-ready format that can later be used for further data exploration, reporting, and visualization.


πŸ“Œ Project Overview

The original dataset contains records of tech company layoffs from around the world. However, the raw data includes duplicates, inconsistent formats, and null values.

✨ This project focuses on:

Creating a staging table to preserve the raw dataset

Removing duplicate records

Standardizing categorical and numerical data

Fixing incorrect or inconsistent entries (e.g., dates, strings, formatting)

Creating a clean analytics table for further analysis

πŸ” Why I Built This Project

This project was initiated to enhance my proficiency in SQL, particularly in the realm of data cleaningβ€”a fundamental step in any data analysis workflow. Utilizing a publicly available dataset on global tech company layoffs from Kaggle, I aimed to simulate real-world data preprocessing scenarios. The objectives were:

Practice SQL-based data cleaning: Implementing techniques such as removing duplicates, handling null values, and standardizing data formats.

Understand data inconsistencies: Gaining insights into common data quality issues and how to address them effectively.

Prepare data for analysis: Transforming raw data into a structured format suitable for exploratory data analysis and visualization.

Build a portfolio project: Demonstrating my ability to clean and prepare data using SQL, showcasing this skill to potential employers and collaborators.


🎯 Project Goals

πŸ” Identify and remove duplicate records

✏️ Standardize text fields like industry and country names

πŸ”’ Convert and clean numeric fields like total_laid_off and percentage_laid_off

πŸ“† Clean and convert date values into proper MySQL DATE format

πŸ“Š Create an analytics table with appropriate data types and indexes


πŸ›  Tools and Technologies

  • MySQL
  • MySQL Workbench / any SQL client (Dbeaver was used for this project)
  • CSV File Source: Layoffs Dataset
  • SQL for data transformation
  • Git for version control

βœ… Conclusion and Learnings

πŸŽ“ Completing this project provided several key takeaways:

Enhanced SQL skills: Improved my ability to write efficient and advanced SQL queries for data cleaning tasks.

Attention to data quality: Recognized the importance of clean data in deriving accurate insights.

Problem-solving: Developed strategies to deal with common data issues such as duplicates and inconsistent formats.

Preparedness for real-world data: Gained experience that is directly applicable to real-world data analysis projects.

This project solidified my understanding of data cleaning processes and underscored the critical role they play in the broader context of data analytics.

About

A data cleaning operations using MySQL with a publicly available dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •