I was recently assigned a project by Danalytics with the primary objective of identifying and addressing data quality issues using the Flipkart product dataset.
Flipkart is India's leading e-commerce platform that offers a wide range of products from smartphones, laptops, appliances, furniture, beauty, books and more.
My task was to asses the quality of data available for analysis along with documenting any issues found.
Its important as it will help uncover how suitable the data is for analysis. If left unaddressed , we risk basing our analysis on inaccurate or incomplete information, leading to unreliable insights and decisions.
-
Data Collection: The data used within this project was sourced from Kaggle.
-
Data Exploration: Using Excel Power Query, I generated a data profile to examine each attribute within the dataset.
-
Documentation: A DQA(Data Quality Assessment) framework was implemented, establishing key metrics to assess the quality of the given data. This documentation serves as a foundation for accurate analysis and decision-making.
As a result, I was able to identify missing values of about 0.69% within the entire dataset, duplicate records, Inaccessible data as well as irrelevant columns/information that cannot be used for analysis.
Exciting times ahead! Stay tuned as I dive into implementing the proposed solutions outlined in the DQA to optimize the dataset for analysis and visualization.
#DataQuality #ExcelPowerQuery #DataAnalysis #Ecommerce #ProjectHighlight