Everyone’s been on a bad date — the kind where you know it’s not going anywhere. This project uses machine learning to predict when two people are unlikely to go on a second date, based on real speed dating data.
Instead of matching for compatibility, we’re flipping the problem — we want to know when people won’t click. This insight is useful for dating platforms, social experiments, or just anyone curious about human behavior.
- Source: Kaggle Speed Dating Dataset
- Size: 8,293 rows
- Target:
match
(1 = mutual interest, 0 = no match) - Class imbalance: ~84% class 0 (non-match), 16% class 1 (match)
Our primary goal was to maximize recall on class 0 (non-matches). Missing a possible match is okay — but incorrectly assuming someone will match when they won’t is what we want to avoid.
Metric | Class 0 Value |
---|---|
Recall | 89% |
Precision | 91% |
F1 Score | 90% |
ROC AUC | 0.821 |
Accuracy | 84% |
Class 0 (non-matches) was correctly captured 89% of the time — making this model extremely effective at filtering out poor matches.
like
funny_o
shared_interests_o
attractive_partner
guess_prob_liked
These were selected using Cohen’s d, p-values, and Random Forest feature importance.
File | Description |
---|---|
data_wrangling.ipynb |
Data cleaning and early exploration |
eda.ipynb |
Exploratory Data Analysis, feature distributions, statistical testing |
Pre-processing Work and Model.ipynb |
Data prep, feature selection, model building & evaluation |