ELEVETLABS_Task-1_Sales-Data

Task 1: Data Cleaning and Preprocessing

🧹 Data Cleaning and Preprocessing Summary ✅ Task Overview

I worked on a raw auto sales dataset which required cleaning and preprocessing using both Excel and Python (Jupyter Notebook). The steps I followed are outlined below:

💡 Key Data Cleaning Concepts (Interview Q&A)

What are missing values and how do you handle them? Missing values are blanks or NaN in datasets. They can be handled by:

How do you treat duplicate records? Duplicates are removed using drop_duplicates() to ensure accuracy and avoid bias in analysis.
Difference between dropna() and fillna() in Pandas?

dropna() removes rows or columns with null values.
fillna() replaces missing values with a specified value (like mean, median, etc.).

What is outlier treatment and why is it important? Outlier treatment involves identifying and handling extreme values that can skew analysis. Techniques include capping, transformation, or removal.
Explain the process of standardizing data. Standardization means bringing data to a common format — e.g., consistent text casing, date formats, units, and column naming conventions.
How do you handle inconsistent data formats (e.g., date/time)? Use pd.to_datetime() with errors='coerce' and dayfirst=True to parse and convert all dates into a standard format.
What are common data cleaning challenges?

Use .info(), .isnull().sum(), .duplicated().sum(), data type checks, and domain knowledge to validate and ensure clean, high-quality data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Auto Sales Data_Cleaning.ipynb		Auto Sales Data_Cleaning.ipynb
README.md		README.md
auto_sales_cleaned.xlsx		auto_sales_cleaned.xlsx

Provide feedback