Skip to content

๐ŸŽฅ Cine Suggest โ€“ Your Personalized Movie Companion CineSuggest helps you discover films you'll love based on what you already enjoy. Powered by intelligent recommendations and TMDB data, it's your perfect guide to movie nights.

License

Notifications You must be signed in to change notification settings

soumyaDghosh/cine-suggest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฌ Cine Suggest โ€” A Smart Movie Recommender

Cine Suggest is a content-based movie recommendation app built using Streamlit and trained on the TMDB Top 5000 Movies dataset. It helps users discover similar movies based on overview, genre, cast, crew, and keywords using NLP and cosine similarity.

๐Ÿš€ Features

๐Ÿ” Search by movie title

๐ŸŽฏ Content-based recommendations (overview, genres, cast, director)

๐Ÿง  Cosine similarity with TF-IDF vectorization

๐Ÿ–ผ๏ธ Posters fetched live from TMDB API

๐Ÿ“ฑ Mobile-friendly UI with fuzzy search fallback

Datasets

Why TMDB Top 5000 movies data?

  • Balanced Size + Richness: ~5000 movies, with overview, genres, keywords, release dates, popularity โ€” rich enough for a recommendation engine.
  • Modular Structure: Split into two cleanly organized files โ€” movies.csv and credits.csv โ€” making merging easy via the shared id field.
  • Complete Metadata:
    • From movies.csv: title, overview, genres, keywords
    • From credits.csv: movie_id, cast, director, writer, etc. extracted from the JSON-formatted fields
  • Realistic for ML/NLP tasks: Overview and genre fields are perfect for content-based recommendations.

Other datasets considered:

  • IMDB Top 1000 movies database: 1000 movies seemed like a very low number where the original IMDB database contains way more (1000x) data than that.
  • IMDB official Database: Huge database, (11803648 rows) which itself is a overhead for a project like this. The dataset lacks details like overview, plot etc and requires different datasets to get more info on the casts, crew etc.

Data Preprocessing Decisions:

  • Lowercased & No Spaces: Fields like genres, crew, and casts are converted to lowercase and joined by underscores. This prevents token overlap during vectorization. This preprocessing helps increasing the cosine distance between the vectors during vectorization.

    Example: โ€œNeal Cafferyโ€ and โ€œNeal Frankenstineโ€ would both contain the word โ€œNealโ€ โ€” misleading the model into finding them similar.

  • Result: Cleaned, deduplicated token space -> improved cosine distance between distinct vectors.

๐Ÿ› ๏ธ Tech Stack

๐Ÿ–ฅ๏ธ Frontend: Streamlit

๐Ÿ Backend: Python, Pandas, Scikit-learn, Requests

๐ŸŽž๏ธ Data: TMDB 5000 Movies Dataset via Kaggle

๐Ÿงฉ API: TMDB API for fetching live posters

๐Ÿš€ Deployment: Streamlit Community Cloud

๐Ÿ“ฆ Setup & Run Locally

  1. Clone the repo

    git clone https://github.com/yourusername/cinesuggest.git
    cd cine-suggest
  2. Install dependencies

    pip install -r requirements.txt
  3. Set your TMDB API key

    Create a .streamlit/secrets.toml file:

      TMDB_API_KEY = "your_api_key_here"
    
  4. Run the app

    streamlit run app.py
    

๐Ÿ“› License

This project is licensed under the GNU AGPL v3.0. You are free to use, modify, and distribute this software, but any derivative work must also be open-sourced under the same license โ€” even if itโ€™s hosted as a web service.

โœจ Demo & Credits

https://cinesuggest-soumyadghosh.streamlit.app/

Built by Soumyadeep Ghosh as part of a content-based recommendation exploration project. TMDB data ยฉ TMDB and respective contributors.

About

๐ŸŽฅ Cine Suggest โ€“ Your Personalized Movie Companion CineSuggest helps you discover films you'll love based on what you already enjoy. Powered by intelligent recommendations and TMDB data, it's your perfect guide to movie nights.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages