This project demonstrates a complete pipeline for Market Basket Analysis using the Apriori algorithm, enriched with GPT-powered bundle pricing insights. The goal is to:
- Explore transaction data to understand purchase patterns.
- Discover frequent itemsets and association rules (support, confidence, lift).
- Calculate optimal bundle prices (with a configurable discount rate).
- Generate human‑readable marketing recommendations using OpenAI GPT.
By combining classic data mining with modern LLMs, you can deliver both explainable rules and actionable bundle pricing strategies.
- Frequent Itemsets: Groups of items that appear together in transactions more often than a specified threshold (
min_support
). - Association Rules: Implications of the form
A → B
, evaluated by:- Support: Proportion of transactions containing both A and B.
- Confidence: Probability of B given A (
P(B|A)
). - Lift: Ratio of observed support to expected support if A and B were independent (
>1
indicates positive correlation).
An iterative method that:
- Generates candidate itemsets of increasing size.
- Prunes those that do not meet
min_support
. - Derives association rules from the remaining frequent itemsets.
- Raw Bundle Price: Sum of average unit prices of all items in the rule.
- Suggested Bundle Price: Applies a configurable discount rate (e.g. 15%).
- GPT Insight: A concise marketing recommendation that mentions the bundle price, generated by calling OpenAI’s
gpt-3.5-turbo
(or other selected model).
├── data/
│ └── market_basket_dataset.csv # Raw transactions (semicolon-delimited)
├── notebooks/
│ └── new_mba.ipynb # Jupyter notebook with EDA, Apriori, GPT integration
├── output/
│ ├── bundle_insights.json # Generated insights (raw JSON)
│ └── bundle_insights.md # Markdown summary of bundle recommendations
├── README.md # This file
├── requirements.txt # Python dependencies
├── .env # Environment variables (OpenAI API key)
Market Basket Analysis Dataset:
Kaggle: https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
-
Clone the repo and navigate to its root:
git clone <repo_url> cd market-basket-analysis
-
Create a virtual environment and install requirements:
python -m venv mba source mba/bin/activate # macOS/Linux mba\Scripts\activate # Windows pip install --upgrade pip pip install -r requirements.txt
-
Configure your OpenAI key in
.env
:OPENAI_API_KEY=sk-...
-
Exploratory Data Analysis (EDA):
- Load
market_basket_dataset.csv
(handles semicolon delimiter and European decimals). - Visualize: top items, quantity and price distributions (capped views and log scale), monthly transaction counts.
- Load
-
Apriori & Association Rules:
- Filter positive quantities, group by
BillNo
to form baskets. - One-hot encode and run Apriori with
min_support=0.01
. - Generate rules with
lift > 1.2
andconfidence > 0.3
.
- Filter positive quantities, group by
-
Bundle Pricing Calculation:
- Compute average price per item from the dataset.
- Sum antecedents + consequents to get raw bundle price.
- Apply discount (default 15%) for suggested bundle price.
-
GPT Integration:
- Serialize the top N rules (default 10) with pricing fields to JSON.
- Craft a prompt asking GPT to return a JSON array of objects containing
antecedents
,consequents
,bundle_price
, and aninsight
sentence. - Call OpenAI’s
gpt-3.5-turbo
for cost‑effective insights.
-
Output & Export:
- Parse GPT response into a pandas DataFrame.
- Rename columns for business users:
Items Bought
,Also Bought
,Bundle Price
,Recommendation
. - Sanitize text and print a GitHub‑flavored Markdown table.
- Save the JSON and Markdown to
output/
.
- Support & Lift Thresholds: Tweak
min_support
,lift > X
,confidence > Y
to refine rule quality. - Discount Rate: Change
discount_rate
in the pricing script to adjust bundle recommendations. - Model Selection: Swap
model="gpt-3.5-turbo"
forgpt-4o
orgpt-4
if quality demands justify cost. - Country Filter: Add
df = df[df.Country == "United Kingdom"]
(or another) to focus on specific markets.
- Agrawal, R. et al. (1993). Mining Association Rules between Sets of Items in Large Databases. SIGMOD.
- Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining. Chapters on Association Analysis.
- OpenAI API docs: https://platform.openai.com/docs
Happy analyzing and bundling!