In this assignment, we conducted an in-depth movie similarity analysis using the Movie Plot Synopses dataset available on Kaggle. You can access the dataset here.
-
Kaggle Dataset Page: Access the Movie Plot Synopses dataset on Kaggle through the provided link.
-
Download the Dataset: Download the dataset in CSV format using the available option on Kaggle.
-
IMDb ID: Unique identification for each movie, facilitating cross-referencing and analysis.
-
Movie Title: Inclusion of movie titles for easy identification and categorization.
-
Plot Synopses: Detailed plot synopses provide a rich source of textual data, capturing the narrative essence of each film.
-
Tokenization: Break down each synopsis into individual words or tokens, forming a basis for analysis.
-
Lemmatization: Reduce words to their base or root form using the NLTK library for lemmatization, considering contextual meaning.
-
Stopword Removal: Utilize the NLTK stopwords list for English to remove common words that do not contribute significantly to meaning.
-
Bag of Words (BoW):
-
Term Frequency-Inverse Document Frequency (TF-IDF):
-
Continuous Bag of Words (CBOW):
Cosine similarity metric is employed to quantify the degree of similarity between movie plot synopses, producing three distinct similarity scores corresponding to BoW, TF-IDF, and CBOW embeddings.
... | ... | ...
Among the trio, the CBOW model emerged as the frontrunner, showcasing superior performance in capturing subtle nuances and contextual intricacies within the movie synopses.
Noteworthy examples include "Iron Man" and its sequels, where CBOW achieved impressive similarity scores, such as 93.01% for "Iron Man" and 87.84% for "Iron Man 3."
[GitHub Link](https://github.com/Vikas-ABD/Movie_Similarity_Analysis_Recommendation .git)
- Clone the Repository:
git clone https://github.com/Vikas-ABD/Movie_Similarity_Analysis_Recommendation
.git ```
-
Install Requirements:
pip install -r requirements.txt
-
Run the Notebook in Google Colab: Execute the notebook to generate
similarity.pkl
anddata_frame.pkl
files. -
Create Streamlit Web Application:
- Utilize the generated files for creating a user-friendly Streamlit web application.
- Extract movie images from the IMDb site through API.
-
Run the Web Application:
streamlit run app.py
Access the web application locally for exploring movie recommendations.
web app working demo: