Unsupervised ML: Athlete Recommender & Analytics for Paris 2024
📌 Project Overview
This project applies unsupervised machine learning to analyze and recommend athletes for the Paris 2024 Olympic & Paralympic Games. Using clustering and similarity-based methods, we built an athlete recommender system that identifies similar athletes based on key performance, demographic, and social influence attributes.
🚀 Features
- Athlete Similarity Recommender: Finds and suggests athletes with similar profiles.
- Data Preprocessing & Normalization: Encoding, scaling, and handling categorical attributes.
- Chatbot Integration: Users can query the system using text inputs (e.g., "Football male athletes for JO2024 with Instagram influence >10K").
- Exploratory Data Analysis (EDA): Understanding athlete distribution based on multiple attributes.
- Visualization: Graphical representations of athlete clusters and similarities.
📊 Dataset
The dataset contains 600 athletes with 82 features, including:
- Demographics: Age, Gender
- Performance Metrics: Medals (Gold, Silver, Bronze)
- Competition Status: Participation in Paris 2024 (Qualified, Non-Eligible, etc.)
- Sport & Para-sport Category: Encoded categorical variables
- Influence Metrics: Social media following (normalized values)
🛠 Methodology
1️⃣ Data Preprocessing & Encoding
- Encoded categorical variables using the one-hot encoding (e.g., sports, gender, status).
- Normalized numerical features using MinMaxScaler (Age, Followers, etc.).
- Kept only relevant and non-missing features for scoring and similarity analysis.
2️⃣ Feature Engineering & Similarity Computation
- Built a feature matrix (X) combining sports, performance, and influence data.
- Used cosine similarity to measure athlete likeness.
- Identified the top 5 most similar athletes for any given athlete.
3️⃣ Chatbot for Athlete Search & Recommendations
- Developed a query-based chatbot that filters athletes based on textual queries.
- Users can search for athletes using conditions like sport type, gender, status, and social influence.
- Returns a ranked list of matching athletes with their details.
📈 Visualization & Analysis
- Heatmaps to illustrate athlete similarity scores.
- Clustering of athletes based on performance and social influence.
- Graphical representation of top athlete recommendations.
🏆 Example: Finding Similar Athletes
For example, given Hélène Noesmoen, the system recommends the following top 5 similar athletes:
- Axel Mazella
- Louise Cervera
- Lou Berthomieu
- Jean-Baptiste Bernaz
- Charline Picon
These recommendations are based on shared attributes like sports category, performance, and social influence.
📌 Future Enhancements
- Integrate deep learning for enhanced similarity detection.
- Expand the chatbot with natural language processing (NLP) for better query understanding.
- Add real-time athlete data updates.
Made with ❤️ for Paris 2024 Data Analysis & Recommender Systems 🏅