Predicting Non-Matches in Speed Dating with Machine Learning

Overview

Everyone’s been on a bad date — the kind where you know it’s not going anywhere. This project uses machine learning to predict when two people are unlikely to go on a second date, based on real speed dating data.

Why This Matters

Instead of matching for compatibility, we’re flipping the problem — we want to know when people won’t click. This insight is useful for dating platforms, social experiments, or just anyone curious about human behavior.

Dataset

Source: Kaggle Speed Dating Dataset
Size: 8,293 rows
Target: match (1 = mutual interest, 0 = no match)
Class imbalance: ~84% class 0 (non-match), 16% class 1 (match)

Objective

Our primary goal was to maximize recall on class 0 (non-matches). Missing a possible match is okay — but incorrectly assuming someone will match when they won’t is what we want to avoid.

Best Model: Logistic Regression (Tuned)

Metric	Class 0 Value
Recall	89%
Precision	91%
F1 Score	90%
ROC AUC	0.821
Accuracy	84%

Class 0 (non-matches) was correctly captured 89% of the time — making this model extremely effective at filtering out poor matches.

Top Features (by Importance & Significance)

like
funny_o
shared_interests_o
attractive_partner
guess_prob_liked

These were selected using Cohen’s d, p-values, and Random Forest feature importance.

Project Structure

File	Description
`data_wrangling.ipynb`	Data cleaning and early exploration
`eda.ipynb`	Exploratory Data Analysis, feature distributions, statistical testing
`Pre-processing Work and Model.ipynb`	Data prep, feature selection, model building & evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
Speed_Dating_Capstone_Presentation (1).pptx.pdf		Speed_Dating_Capstone_Presentation (1).pptx.pdf
capstone 2 report (2).pdf		capstone 2 report (2).pdf
cleaned_speeddating.csv		cleaned_speeddating.csv
data_wrangling.ipynb		data_wrangling.ipynb
eda.ipynb		eda.ipynb
final_data.csv		final_data.csv
modelling.ipynb		modelling.ipynb
speeddating.csv		speeddating.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Non-Matches in Speed Dating with Machine Learning

Overview

Why This Matters

Dataset

Objective

Best Model: Logistic Regression (Tuned)

Top Features (by Importance & Significance)

Project Structure

About

Uh oh!

Releases

Packages

Languages

ovoarnav/match-maker

Folders and files

Latest commit

History

Repository files navigation

Predicting Non-Matches in Speed Dating with Machine Learning

Overview

Why This Matters

Dataset

Objective

Best Model: Logistic Regression (Tuned)

Top Features (by Importance & Significance)

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages