This project analyzes weekly transaction data from Fresco Supermarket, one of the UK’s largest grocery retailers. The aim is to identify patterns and predictors of customer basket value, classifying shoppers as either Low Spenders (£50 or less per basket) or High Spenders (over £50), using a range of business analytics tools.
- Source: Fresco Supermarket Loyalty Cardholder Weekly Data (26-week period)
- Sample: Randomly selected loyalty cardholders across three channels: convenience stores, superstores, and online platform
- Variables:
- Gender: Customer gender (Male/Female)
- Age: Customer age in years
- Store_Type: Type of store (Convenience, Superstore, Online)
- Shopping_Frequency: Number of shopping visits per week
- Basket_Value: Total basket spend (£)
- Basket_Consistency: Predominant product type (e.g., Value, Brand, Fresco Top)
- HighSpender: Target binary variable: 1 = High Spender (>£50), 0 = Low Spender (≤£50)
- File:
Portfolio-Task-1-Short Data_Fresco1.xlsx
- Classify Fresco customers as Low or High Spenders based on demographic and behavioural data.
- Identify key predictors of high-value spending for targeted marketing.
- Explored and cleaned the dataset for analysis.
- Applied logistic regression to predict HighSpender status.
- Evaluated model performance using accuracy, R², and classification tables.
-
Classification Accuracy: The model correctly classified 94.7% of all customers.
-
Model Fit: Strong model fit (Nagelkerke R² = 0.873).
-
Statistical Significance: The model is highly significant (Omnibus test, p < 0.001).
- Target high-frequency shoppers and customers with consistent preferences for premium (Fresco Top) or branded products for upselling and loyalty campaigns.
- Utilize customer age and store type in segment-specific promotions.
- Monitor shopping frequency as a key predictor of spending behaviour.
- Loaded dataset from Excel (
Portfolio-Task-1-Short Data_Fresco1.xlsx
). - Inspected for missing values and outliers. Minimal cleaning was required; outliers in basket value were retained to preserve spend variation.
- Engineered
HighSpender
as the binary target variable (1 if Basket_Value > £50; else 0).
- Logistic regression was chosen for its interpretability and suitability for binary outcomes.
- Alternative models (e.g., decision trees) were considered, but logistic regression provided clear coefficient interpretations and robust diagnostics.
- Fitted a logistic regression model using predictors: Age, Gender, Store_Type, Shopping_Frequency, and Basket_Consistency.
- Assumptions checked:
- Linearity of logit for numeric predictors (age, frequency)
- No perfect multicollinearity detected among predictors
- All input variables were categorical or continuous, as required
-
Classification Table:
Overall accuracy: 94.7%. Sensitivity and specificity both above 93%. -
Omnibus Test:
Model is highly significant (p < 0.001). -
Model Fit:
Cox & Snell R² = 0.650, Nagelkerke R² = 0.873 (very strong model fit).
- Shopping frequency was the strongest positive predictor: each additional visit per week increased the odds of being a High Spender.
- Age showed a weak association with spending, but younger customers were slightly more likely to be high spenders when controlling for other factors.
- Store type and basket consistency (premium brands or Fresco Top products) also contributed positively to HighSpender classification.
- Classification accuracy: 94.7% (see table above).
- Very high R² values for a logistic regression, suggesting strong explanatory power. However, this may reflect dataset characteristics (sample, feature engineering, or possible overfitting on small subsample).
- Confusion matrix shows balanced sensitivity and specificity (false positives and negatives are minimal).
- Sample size is modest for generalization—results are robust for the subsample but may require validation on larger or full datasets.
- Some variables (such as basket consistency) are self-reported or categorical; future work could further refine these with more granular product data.
- Model assumes stability of customer behaviour over the observed period; seasonal effects or promotions not modeled here.
- This analysis demonstrates that Fresco can accurately predict high-value spenders using standard loyalty cardholder data.
- Logistic regression provides interpretable and actionable insights for marketing management, especially around frequency and channel preference.
- Further improvements could involve testing ensemble models or time-series features as more data becomes available.