This project predicts yearly customer spending based on their activity on an e-commerce platform. It includes data exploration, regression modeling (Linear, Ridge, and Lasso), evaluation, and a deployment-ready Streamlit dashboard.
-
Languages: Python 3
-
Libraries:
- Data Handling:
pandas
,numpy
- Visualization:
matplotlib
,seaborn
- Modeling:
scikit-learn
,statsmodels
,joblib
- App Interface:
streamlit
- Data Handling:
-
Source:
Ecommerce Customers.csv
-
Features Used:
Avg. Session Length
Time on App
Time on Website
Length of Membership
Yearly Amount Spent
(target)
- Scatterplots to explore relationships (e.g. Time on App vs Yearly Spending)
- Barplots to analyze binned behavior
- Pairplots & Jointplots for pairwise feature insights
- Distribution plots for residual analysis
- Performed using both
scikit-learn
andstatsmodels
- Extracted coefficients and evaluated performance (R² score)
- Added regularization to improve generalization
- Used pipelines with standard scaling
- Compared metrics: MAE, MSE, RMSE, R2
- Used Variance Inflation Factor (VIF) to ensure no multicollinearity among features
-
Metrics Used:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R² Score
-
Plots:
- Actual vs Predicted
- Residual distribution
- Normal Q-Q plot
This project builds and deploys a machine learning model to predict yearly e-commerce customer spending based on usage metrics like session length, time on app, and membership duration.
Key highlights:
Model Used: Linear Regression (with comparisons to Ridge & Lasso)
Best Model R² Score: 0.98, indicating a strong predictive fit
Mean Absolute Error (MAE): 7.54
Root Mean Squared Error (RMSE): 9.36