This project is a Capstone Project completed as part of the Edureka Data Science with SQL course.
It showcases a complete movie recommendation system using the MovieLens dataset.
The project implements three types of recommendation systems:
1️⃣ Popularity-Based Recommender:
Recommends top-rated movies in a selected genre with a minimum review threshold.
2️⃣ Content-Based Recommender:
Suggests movies similar to a selected one based on genre similarity using TF-IDF and Cosine Similarity.
3️⃣ Collaborative Filtering Recommender:
Provides personalized movie recommendations based on ratings from similar users.
Interactive inputs and colorful visualizations make it easy to experiment with different genres, movies, and user IDs.
The goal of this project is to apply data science techniques to create personalized content discovery systems, similar to platforms like Netflix or Prime Video.
- Python: Programming language
- Jupyter Notebook: Interactive coding environment
- Pandas: Data manipulation and analysis
- NumPy: Numerical computing
- scikit-learn: Machine learning models and similarity computations
- Plotly: Interactive data visualizations
- MovieLens Dataset: Real-world dataset for movies and ratings
- SQL (Course Context): While this project uses CSVs, concepts from SQL querying and data extraction are applied as part of the Edureka Data Science with SQL course
movie-recommendation-system/
├── README.md
├── LICENSE (MIT)
├── movie_recommendation_system.ipynb
├── movies.csv
├── ratings.csv
- README.md: This documentation file
- LICENSE: Open-source MIT License
- movie_recommendation_system.ipynb: Main Jupyter Notebook with code and explanations
- movies.csv: Movie metadata (ID, title, genres)
- ratings.csv: User ratings data (user ID, movie ID, rating, timestamp)
- 🎭 Genre Distribution: Visualized as a colorful horizontal bar chart
- ⭐ Rating Distribution: Visualized as a histogram with interactive features
These plots are generated using Plotly and are fully interactive in the notebook.
- Input: Genre, Minimum number of reviews, Number of recommendations
- Output: Top-rated movies in the selected genre
- Input: Movie title, Number of similar movies to recommend
- Output: Similar movies based on genre similarity using TF-IDF and Cosine Similarity
- Input: User ID, Number of similar users (K), Number of recommendations
- Output: Personalized movie recommendations based on user similarity
1️⃣ Clone this repository: git clone https://github.com/Sravyatogarla/Movie-recommendation-system.git
The data used in this project comes from the MovieLens dataset, a widely used dataset in recommender systems research.
- movies.csv: Contains
movieId
,title
, andgenres
. - ratings.csv: Contains
userId
,movieId
,rating
, andtimestamp
.
- Key Highlights
-
Interactive user inputs for all three recommdation models
-
Clear and colorful data visualizations using Plotly
-
Clean and well-commented code with explanations
4.Real-world application of data science concepts from the Edureka Data Science with SQL course
- Future Enhancements
Add advanced collaborative filtering (SVD, matrix factorization)
Build a web-based interface using Streamlit or Flask
Incorporate user feedback for dynamic recommendations
- License
This project is licensed under the MIT License. See the LICENSE file for full details.
👨💻 Author Sravya Togarla
Edureka Data Science with SQL Learner
Connect on LinkedIn www.linkedin.com/in/sravya-togarla
Course Details This couse covers following topics :
-
Python Programming
-
Probabilty and Statistics
-
Supervised Machine Learning
-
Unsupervised Machine Learning
-
Deep learning
-
Natural language processing
-
Data visualization using Tableau
-
SQL for Data Science
✅ Conclusion
This Movie Recommendation System demonstrates how data science can enhance user experience through personalized content discovery. With practical implementations of popularity-based, content-based, and collaborative filtering models, this project bridges the gap between learning and real-world applications.