Skip to content

ovoarnav/match-maker

Repository files navigation

Predicting Non-Matches in Speed Dating with Machine Learning

Overview

Everyone’s been on a bad date — the kind where you know it’s not going anywhere. This project uses machine learning to predict when two people are unlikely to go on a second date, based on real speed dating data.

Why This Matters

Instead of matching for compatibility, we’re flipping the problem — we want to know when people won’t click. This insight is useful for dating platforms, social experiments, or just anyone curious about human behavior.

Dataset

  • Source: Kaggle Speed Dating Dataset
  • Size: 8,293 rows
  • Target: match (1 = mutual interest, 0 = no match)
  • Class imbalance: ~84% class 0 (non-match), 16% class 1 (match)

Objective

Our primary goal was to maximize recall on class 0 (non-matches). Missing a possible match is okay — but incorrectly assuming someone will match when they won’t is what we want to avoid.

Best Model: Logistic Regression (Tuned)

Metric Class 0 Value
Recall 89%
Precision 91%
F1 Score 90%
ROC AUC 0.821
Accuracy 84%

Class 0 (non-matches) was correctly captured 89% of the time — making this model extremely effective at filtering out poor matches.

Top Features (by Importance & Significance)

  • like
  • funny_o
  • shared_interests_o
  • attractive_partner
  • guess_prob_liked

These were selected using Cohen’s d, p-values, and Random Forest feature importance.

Project Structure

File Description
data_wrangling.ipynb Data cleaning and early exploration
eda.ipynb Exploratory Data Analysis, feature distributions, statistical testing
Pre-processing Work and Model.ipynb Data prep, feature selection, model building & evaluation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published