Cafe Sales Data Cleaning Project
Here, I Need to Clean the Dataset which is dirty
dirtydataset nothing but the calues of calaculated columns or any values which are error or unknows i need to fill those error and unknown values with the help of numpy and numerical calculations
-
PROJECT GOALS
-
Handle missing or incorrect values in both numeric and text data
-
Create clean, consistent datasets ready for analysis and modeling
-
FILES INCLUDED:-
-
| File Name | Description
dirty_cafe_sales.csv
| Original raw dataset (optional)
clean_numeric.py
| Script to clean numeric data
cleaned_numeric_data.csv
| Cleaned numeric data
clean_text.py
| Script to clean text columnscleaned_cafe_sales_text.csv
| Cleaned text/categorical data
final_cleaned_data.csv
| Fully cleaned and merged dataset -
Cleaning Process
-
Replaced
NaN
,ERROR
,UNKNOWN
in numeric columns using formulas. -
Replaced dirty text values (
UNKNOWN
,ERROR
) with"Missing"
-
TOOLS USED
-
Python
-
Numpy
-
Pycharm(IDE)
-
#next Steps
-
Script to normalize numerical fields and Final normalized output for ML
-
#Author
-
SURYADEV05
-
B.Tech CSE(AIML)
-
Email:-
suryatt.05@gmail.com
-
Linkedin:-
https://www.linkedin.com/in/thirukkovalluru-surya-teja-6771922bb/