This project revolves around data exploration and visualization using the R programming language. In the first problem, we delve into realdirect.com's datasets from various NYC boroughs. The goal is to gain a deeper understanding of user navigation and website organization. Through the following steps:
Data Loading and Cleaning: Import datasets for different boroughs and ensure data cleanliness. Address outliers, missing values, and data format issues, ensuring that numerical values are correctly identified.
Exploratory Data Analysis (EDA)- Real Estate: Uncover insights by conducting comprehensive EDA. Visualize and compare residential building categories across boroughs and time periods. Utilize histograms, boxplots, and scatterplots to provide a visual representation. Present summary statistics to support findings.
Moving to the second problem:
Data Handling - Ad Click Data: Utilize R to manage the nyt1.csv, nyt2.csv, and nyt3.csv datasets. These represent days of ads and clicks on the New York Times homepage by individual users.
Age Group Categorization: Create an age_group variable to categorize users into distinct age groups (<20, 20-29, 30-39, 40-49, 50-59, 60-69, 70+).
Exploratory Data Analysis (EDA) - Ad Clicks: Focus on a daily basis, analyzing impressions and click-through rates (CTR) based on age categories. Define new variables to segment users based on click behavior, enabling deeper analysis.
Comparative Analysis: Utilize visualizations and quantitative assessments to compare user segments and demographics, such as comparing <20-year-old males versus females or logged-in versus not logged-in users.
Temporal Analysis: Extend the analysis over multiple days to observe trends and changes over time. Visualize metrics and distributions across different time periods.
This project provides hands-on experience in data handling, cleaning, exploration, and visualization using R. By delving into both real estate and ad click datasets, participants gain insights into user behavior, website usability, and advertising trends.