This project performs statistical analysis on the Online Retail II dataset (Kaggle).
The goal is to explore customer purchasing patterns, identify high-value segments, and test hypotheses regarding sales behavior across regions and time periods.
- Data cleaning and preprocessing
- Exploratory Data Analysis (EDA)
- Statistical hypothesis testing (t-tests, ANOVA, Chi-Square, etc.)
- Visualization of sales trends and customer behavior
- Business insights from data-driven results
online-retail-statistical-analysis/
β
βββ data/ # Dataset (not included due to license)
βββ scripts/ # R scripts for analysis
βββ images/ # Generated plots
βββ requirements.txt # Dependencies (R packages)
βββ README.md # Documentation
git clone https://github.com/erenbg1/online-retail-statistical-analysis.git
cd online-retail-statistical-analysisIn R, install required libraries:
install.packages(c("tidyverse", "ggplot2", "dplyr", "lubridate", "car"))Open the scripts in scripts/ folder and run them in RStudio or terminal:
source("scripts/data_cleaning.R")
source("scripts/eda_visualizations.R")
source("scripts/hypothesis_testing.R")- Top 5 Countries by Sales Volume: United Kingdom, Netherlands, Germany, France, Ireland
- Hypothesis Test: Significant difference found between sales volume across weekdays vs weekends.
- Visualization: Monthly sales trend shows strong seasonality (peaks in Q4).
- Perform advanced customer segmentation (RFM analysis, clustering).
- Introduce predictive modeling for sales forecasting.
- Add anomaly detection for unusual purchasing patterns.
- Deploy interactive dashboards (Shiny).
This project is released under the MIT License.