Skip to content

This project demonstrates a complete pipeline for Market Basket Analysis using the Apriori algorithm, enriched with GPT-powered bundle pricing insights.

Notifications You must be signed in to change notification settings

shree6791/MarketBasketAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Market Basket Analysis & Bundle Insight Generator

Overview

This project demonstrates a complete pipeline for Market Basket Analysis using the Apriori algorithm, enriched with GPT-powered bundle pricing insights. The goal is to:

  1. Explore transaction data to understand purchase patterns.
  2. Discover frequent itemsets and association rules (support, confidence, lift).
  3. Calculate optimal bundle prices (with a configurable discount rate).
  4. Generate human‑readable marketing recommendations using OpenAI GPT.

By combining classic data mining with modern LLMs, you can deliver both explainable rules and actionable bundle pricing strategies.


Key Concepts

Market Basket Analysis

  • Frequent Itemsets: Groups of items that appear together in transactions more often than a specified threshold (min_support).
  • Association Rules: Implications of the form A → B, evaluated by:
    • Support: Proportion of transactions containing both A and B.
    • Confidence: Probability of B given A (P(B|A)).
    • Lift: Ratio of observed support to expected support if A and B were independent (>1 indicates positive correlation).

Apriori Algorithm

An iterative method that:

  1. Generates candidate itemsets of increasing size.
  2. Prunes those that do not meet min_support.
  3. Derives association rules from the remaining frequent itemsets.

Bundle Pricing & GPT Enrichment

  • Raw Bundle Price: Sum of average unit prices of all items in the rule.
  • Suggested Bundle Price: Applies a configurable discount rate (e.g. 15%).
  • GPT Insight: A concise marketing recommendation that mentions the bundle price, generated by calling OpenAI’s gpt-3.5-turbo (or other selected model).

Project Structure

├── data/
│   └── market_basket_dataset.csv    # Raw transactions (semicolon-delimited)
├── notebooks/
│   └── new_mba.ipynb                # Jupyter notebook with EDA, Apriori, GPT integration
├── output/
│   ├── bundle_insights.json        # Generated insights (raw JSON)
│   └── bundle_insights.md          # Markdown summary of bundle recommendations
├── README.md                        # This file
├── requirements.txt                 # Python dependencies
├── .env                             # Environment variables (OpenAI API key)

Market Basket Analysis Dataset:  
Kaggle: https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis

Setup

  1. Clone the repo and navigate to its root:

    git clone <repo_url>
    cd market-basket-analysis
  2. Create a virtual environment and install requirements:

    python -m venv mba
    source mba/bin/activate    # macOS/Linux
    mba\Scripts\activate       # Windows
    pip install --upgrade pip
    pip install -r requirements.txt
  3. Configure your OpenAI key in .env:

    OPENAI_API_KEY=sk-...

Running the Analysis

  1. Exploratory Data Analysis (EDA):

    • Load market_basket_dataset.csv (handles semicolon delimiter and European decimals).
    • Visualize: top items, quantity and price distributions (capped views and log scale), monthly transaction counts.
  2. Apriori & Association Rules:

    • Filter positive quantities, group by BillNo to form baskets.
    • One-hot encode and run Apriori with min_support=0.01.
    • Generate rules with lift > 1.2 and confidence > 0.3.
  3. Bundle Pricing Calculation:

    • Compute average price per item from the dataset.
    • Sum antecedents + consequents to get raw bundle price.
    • Apply discount (default 15%) for suggested bundle price.
  4. GPT Integration:

    • Serialize the top N rules (default 10) with pricing fields to JSON.
    • Craft a prompt asking GPT to return a JSON array of objects containing antecedents, consequents, bundle_price, and an insight sentence.
    • Call OpenAI’s gpt-3.5-turbo for cost‑effective insights.
  5. Output & Export:

    • Parse GPT response into a pandas DataFrame.
    • Rename columns for business users: Items Bought, Also Bought, Bundle Price, Recommendation.
    • Sanitize text and print a GitHub‑flavored Markdown table.
    • Save the JSON and Markdown to output/.

Customization

  • Support & Lift Thresholds: Tweak min_support, lift > X, confidence > Y to refine rule quality.
  • Discount Rate: Change discount_rate in the pricing script to adjust bundle recommendations.
  • Model Selection: Swap model="gpt-3.5-turbo" for gpt-4o or gpt-4 if quality demands justify cost.
  • Country Filter: Add df = df[df.Country == "United Kingdom"] (or another) to focus on specific markets.

References & Further Reading

  • Agrawal, R. et al. (1993). Mining Association Rules between Sets of Items in Large Databases. SIGMOD.
  • Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining. Chapters on Association Analysis.
  • OpenAI API docs: https://platform.openai.com/docs

Happy analyzing and bundling!

About

This project demonstrates a complete pipeline for Market Basket Analysis using the Apriori algorithm, enriched with GPT-powered bundle pricing insights.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published