This project performs a comprehensive Exploratory Data Analysis (EDA) and Feature Engineering on the Google Play Store Dataset. The goal is to clean messy data, engineer meaningful features, and prepare the dataset for deeper analysis and potential machine learning modeling.
--Data Cleaning & Preprocessing --Handling Missing and Invalid Values --String Manipulation and Conversion --Feature Engineering (e.g., extracting day, month, year from dates) --Data Type Optimization --Saving Processed Data --Working with Time-Series Data --Preparing Data for ML Models
Source: Google Play Store Dataset (CSV):https://raw.githubusercontent.com/krishnaik06/playstore-Dataset/main/googleplaystore.csv
Fields: App Name, Category, Rating, Reviews, Size, Installs, Price, Type, Content Rating, Genres, Last Updated, Current Version, Android Version.
--Advanced Cleaning: Handle missing values in Size with predictive imputation rather than dropping or ignoring. --Data Visualization: Create interactive visualizations using libraries like Plotly or Altair. --Statistical Analysis: Explore correlation between app size, installs, and ratings. --Clustering: Group apps into clusters based on installs, rating, and price. --Machine Learning: Predict an app's success (high installs/rating) based on features. --Deployment: Create a dashboard summarizing the cleaned dataset.