This project was a part of the Masters of Science in Business Analytics at the University of Montana.
This project aims to use K-Nearest Neighbors (KNN) to classify a local tavern's customer base. These classifications can be used to target audiences for marketing purposes to increase RIO, retain customers and learn more business insights.
While SKlearn in Python is very robust, this project uses the Euclidean Distance method in Python by hand to classify an audience. After classifying the audience, a segmented dataset was used to compare the accuracy of my segmentation versus the correct set.
Plots made with Seaborn were used to help understand the accuracy and the dataset. In addition, data visuals like this help understand the broader concept of the segments.
-
Using KNN to segment customer audiences.
-
Foundational understanding of KNN and how it works by hand.
-
KNN comes with limitations but is still powerful with smaller datasets.
customer_id
: a unique ID for the customermost_popular_category
: the most popular drink category for the customerrelationship_days
: the number of days between the first and last transaction for the customer.total_spend
: the lifetime total spend by the customerbeverage_categories
: the number of distinct beverage categories the customer has purchased from.segment
: a four-level categorical variable that segments the customers.