Skip to content

ravi46931/movie_recommender_system

Repository files navigation

Movie Recommendation System

This document describes a movie recommendation system designed to help users discover movies based on their preferences.

Link:

https://movie-recommend.azurewebsites.net

demo

Technologies and Frameworks

  • Flask
  • Azure

Dataset

The system leverages data from the IMDb: https://www.imdb.com/ movie database through two phases:

Phase 1: GroupLens Data

  • Data source: GroupLens: https://grouplens.org/datasets/
  • Files used:
    • movies.csv (movieId, title, genre)
    • links.csv (movieId, imdbId, tmdbId)
    • ratings.csv (userId, movieId, rating)
  • Preprocessing:
    • Removed unused files (genome-*.csv, tags.csv)
    • Filtered ratings.csv:
      • Removed users with less than 100 ratings.
      • Removed movies with less than 100 ratings.
      • Dropped userId column.
      • Calculated average rating per movie.

Phase 2: Cinemagoer Library

  • Fetches additional movie attributes using imdbId:
    • plot
    • cast
    • crew
    • director
    • countries
    • languages
    • production companies

Pipeline

The system follows a sequential pipeline:

1. Data Ingestion:

  • Downloads data from GitHub.
  • Unzips and saves data to DataIngestionArtiacts folder.

2. Data Preprocessing:

  • Cleans and prepares data.
  • Generates two CSV files:
    • movieId, title, genres, imdbId (saved in DataPreprocessingArtiacts)
    • imdbId (used for further data collection)

3. Data Collection:

  • Fetches additional attributes using Cinemagoer library and imdbId.
  • Retrieves data from GitHub by default (faster).
  • Option to fetch live data by setting COLLECTION_FLAG to True (slower).
  • Saves data to DataCollectionArtiacts folder.

4. Data Transformation:

  • Combines data from both phases.
  • Transforms data for model development.
  • Saves data to DataTransformationArtiacts folder.

5. Model Development:

  • Calculates movie similarities using CountVectorizer and cosine similarity.
  • Uses title and imdbId for recommendations.
  • Saves model artifacts to ModelDevelopmentArtiacts folder (including a Data folder).

6. Website Development:

  • Flask framework is used to create a user interface.
  • Users can select a movie and receive recommendations.

Execution:

To execute this project and run the pipeline, install all the requirements and run the following command:

 	python main.py   

Execution of the pipeline can be seen in the logs folder that will create during the run. All the components are saved inside the artifacts folder. A Data folder is also created that uses in the web development part.

Demo of Project:

Make sure you have executed main.py file. For demo of the project can be done by running the following command:

	python app.py

Deployment:

This project is deployed on the Azure, and this is continuos deployment.

Link:

https://movie-recommend.azurewebsites.net/

Contributor:

  • Ravi Kumar

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published