🧹 Layoffs Dataset – Data Cleaning with MySQL

This project performs a complete data cleaning process using MySQL on a dataset of global layoffs available on Kaggle - layoffs. The goal is to transform the raw dataset into a clean, standardized, and analytics-ready format that can later be used for further data exploration, reporting, and visualization.

📌 Project Overview

The original dataset contains records of tech company layoffs from around the world. However, the raw data includes duplicates, inconsistent formats, and null values.

✨ This project focuses on:

Creating a staging table to preserve the raw dataset

Removing duplicate records

Standardizing categorical and numerical data

Fixing incorrect or inconsistent entries (e.g., dates, strings, formatting)

Creating a clean analytics table for further analysis

🔍 Why I Built This Project

This project was initiated to enhance my proficiency in SQL, particularly in the realm of data cleaning—a fundamental step in any data analysis workflow. Utilizing a publicly available dataset on global tech company layoffs from Kaggle, I aimed to simulate real-world data preprocessing scenarios. The objectives were:

Practice SQL-based data cleaning: Implementing techniques such as removing duplicates, handling null values, and standardizing data formats.

Understand data inconsistencies: Gaining insights into common data quality issues and how to address them effectively.

Prepare data for analysis: Transforming raw data into a structured format suitable for exploratory data analysis and visualization.

Build a portfolio project: Demonstrating my ability to clean and prepare data using SQL, showcasing this skill to potential employers and collaborators.

🎯 Project Goals

🔍 Identify and remove duplicate records

✏️ Standardize text fields like industry and country names

🔢 Convert and clean numeric fields like total_laid_off and percentage_laid_off

📆 Clean and convert date values into proper MySQL DATE format

📊 Create an analytics table with appropriate data types and indexes

🛠 Tools and Technologies

MySQL
MySQL Workbench / any SQL client (Dbeaver was used for this project)
CSV File Source: Layoffs Dataset
SQL for data transformation
Git for version control

✅ Conclusion and Learnings

🎓 Completing this project provided several key takeaways:

Enhanced SQL skills: Improved my ability to write efficient and advanced SQL queries for data cleaning tasks.

Attention to data quality: Recognized the importance of clean data in deriving accurate insights.

Problem-solving: Developed strategies to deal with common data issues such as duplicates and inconsistent formats.

Preparedness for real-world data: Gained experience that is directly applicable to real-world data analysis projects.

This project solidified my understanding of data cleaning processes and underscored the critical role they play in the broader context of data analytics.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
mysql_data_cleaning.sql		mysql_data_cleaning.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧹 Layoffs Dataset – Data Cleaning with MySQL

📌 Project Overview

🔍 Why I Built This Project

🎯 Project Goals

🛠 Tools and Technologies

✅ Conclusion and Learnings

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

License

augusto-rosa/mysql-data-cleaning

Folders and files

Latest commit

History

Repository files navigation

🧹 Layoffs Dataset – Data Cleaning with MySQL

📌 Project Overview

🔍 Why I Built This Project

🎯 Project Goals

🛠 Tools and Technologies

✅ Conclusion and Learnings

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages