Market Basket Analysis & Bundle Insight Generator

Overview

This project demonstrates a complete pipeline for Market Basket Analysis using the Apriori algorithm, enriched with GPT-powered bundle pricing insights. The goal is to:

Explore transaction data to understand purchase patterns.
Discover frequent itemsets and association rules (support, confidence, lift).
Calculate optimal bundle prices (with a configurable discount rate).
Generate human‑readable marketing recommendations using OpenAI GPT.

By combining classic data mining with modern LLMs, you can deliver both explainable rules and actionable bundle pricing strategies.

Key Concepts

Market Basket Analysis

Frequent Itemsets: Groups of items that appear together in transactions more often than a specified threshold (min_support).
Association Rules: Implications of the form A → B, evaluated by:
- Support: Proportion of transactions containing both A and B.
- Confidence: Probability of B given A (P(B|A)).
- Lift: Ratio of observed support to expected support if A and B were independent (>1 indicates positive correlation).

Apriori Algorithm

An iterative method that:

Generates candidate itemsets of increasing size.
Prunes those that do not meet min_support.
Derives association rules from the remaining frequent itemsets.

Bundle Pricing & GPT Enrichment

Raw Bundle Price: Sum of average unit prices of all items in the rule.
Suggested Bundle Price: Applies a configurable discount rate (e.g. 15%).
GPT Insight: A concise marketing recommendation that mentions the bundle price, generated by calling OpenAI’s gpt-3.5-turbo (or other selected model).

Project Structure

├── data/
│   └── market_basket_dataset.csv    # Raw transactions (semicolon-delimited)
├── notebooks/
│   └── new_mba.ipynb                # Jupyter notebook with EDA, Apriori, GPT integration
├── output/
│   ├── bundle_insights.json        # Generated insights (raw JSON)
│   └── bundle_insights.md          # Markdown summary of bundle recommendations
├── README.md                        # This file
├── requirements.txt                 # Python dependencies
├── .env                             # Environment variables (OpenAI API key)

Market Basket Analysis Dataset:  
Kaggle: https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis

Setup

Clone the repo and navigate to its root:

git clone <repo_url>
cd market-basket-analysis

Create a virtual environment and install requirements:

python -m venv mba
source mba/bin/activate    # macOS/Linux
mba\Scripts\activate       # Windows
pip install --upgrade pip
pip install -r requirements.txt

Configure your OpenAI key in .env:
```
OPENAI_API_KEY=sk-...
```

Running the Analysis

Exploratory Data Analysis (EDA):
- Load market_basket_dataset.csv (handles semicolon delimiter and European decimals).
- Visualize: top items, quantity and price distributions (capped views and log scale), monthly transaction counts.
Apriori & Association Rules:
- Filter positive quantities, group by BillNo to form baskets.
- One-hot encode and run Apriori with min_support=0.01.
- Generate rules with lift > 1.2 and confidence > 0.3.
Bundle Pricing Calculation:
- Compute average price per item from the dataset.
- Sum antecedents + consequents to get raw bundle price.
- Apply discount (default 15%) for suggested bundle price.
GPT Integration:
- Serialize the top N rules (default 10) with pricing fields to JSON.
- Craft a prompt asking GPT to return a JSON array of objects containing antecedents, consequents, bundle_price, and an insight sentence.
- Call OpenAI’s gpt-3.5-turbo for cost‑effective insights.
Output & Export:
- Parse GPT response into a pandas DataFrame.
- Rename columns for business users: Items Bought, Also Bought, Bundle Price, Recommendation.
- Sanitize text and print a GitHub‑flavored Markdown table.
- Save the JSON and Markdown to output/.

Customization

Support & Lift Thresholds: Tweak min_support, lift > X, confidence > Y to refine rule quality.
Discount Rate: Change discount_rate in the pricing script to adjust bundle recommendations.
Model Selection: Swap model="gpt-3.5-turbo" for gpt-4o or gpt-4 if quality demands justify cost.
Country Filter: Add df = df[df.Country == "United Kingdom"] (or another) to focus on specific markets.

References & Further Reading

Agrawal, R. et al. (1993). Mining Association Rules between Sets of Items in Large Databases. SIGMOD.
Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining. Chapters on Association Analysis.
OpenAI API docs: https://platform.openai.com/docs

Happy analyzing and bundling!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
notebooks		notebooks
output		output
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Market Basket Analysis & Bundle Insight Generator

Overview

Key Concepts

Market Basket Analysis

Apriori Algorithm

Bundle Pricing & GPT Enrichment

Project Structure

Setup

Running the Analysis

Customization

References & Further Reading

About

Uh oh!

Releases

Packages

Languages

shree6791/MarketBasketAnalysis

Folders and files

Latest commit

History

Repository files navigation

Market Basket Analysis & Bundle Insight Generator

Overview

Key Concepts

Market Basket Analysis

Apriori Algorithm

Bundle Pricing & GPT Enrichment

Project Structure

Setup

Running the Analysis

Customization

References & Further Reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages