Skip to content

Statistical analysis of the Online Retail II dataset using R. Includes data cleaning, EDA, hypothesis testing, and business insights with visualizations.

Notifications You must be signed in to change notification settings

erenbg1/online-retail-statistical-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

44 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Online Retail Statistical Analysis

πŸ“Œ Project Overview

This project performs statistical analysis on the Online Retail II dataset (Kaggle).
The goal is to explore customer purchasing patterns, identify high-value segments, and test hypotheses regarding sales behavior across regions and time periods.

✨ Features

  • Data cleaning and preprocessing
  • Exploratory Data Analysis (EDA)
  • Statistical hypothesis testing (t-tests, ANOVA, Chi-Square, etc.)
  • Visualization of sales trends and customer behavior
  • Business insights from data-driven results

πŸ“‚ Repository Structure

online-retail-statistical-analysis/
β”‚
β”œβ”€β”€ data/                   # Dataset (not included due to license)
β”œβ”€β”€ scripts/                # R scripts for analysis
β”œβ”€β”€ images/                 # Generated plots
β”œβ”€β”€ requirements.txt        # Dependencies (R packages)
└── README.md               # Documentation

πŸš€ Getting Started

1. Clone the repository

git clone https://github.com/erenbg1/online-retail-statistical-analysis.git
cd online-retail-statistical-analysis

2. Install dependencies

In R, install required libraries:

install.packages(c("tidyverse", "ggplot2", "dplyr", "lubridate", "car"))

3. Run analysis

Open the scripts in scripts/ folder and run them in RStudio or terminal:

source("scripts/data_cleaning.R")
source("scripts/eda_visualizations.R")
source("scripts/hypothesis_testing.R")

πŸ“Š Example Results

  • Top 5 Countries by Sales Volume: United Kingdom, Netherlands, Germany, France, Ireland
  • Hypothesis Test: Significant difference found between sales volume across weekdays vs weekends.
  • Visualization: Monthly sales trend shows strong seasonality (peaks in Q4).

πŸ“ˆ Future Work

  • Perform advanced customer segmentation (RFM analysis, clustering).
  • Introduce predictive modeling for sales forecasting.
  • Add anomaly detection for unusual purchasing patterns.
  • Deploy interactive dashboards (Shiny).

πŸ“ License

This project is released under the MIT License.

About

Statistical analysis of the Online Retail II dataset using R. Includes data cleaning, EDA, hypothesis testing, and business insights with visualizations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages