Movie Similarity Analysis and Recommendation

Assignment Overview

In this assignment, we conducted an in-depth movie similarity analysis using the Movie Plot Synopses dataset available on Kaggle. You can access the dataset here.

Data Collection

Kaggle Dataset Page: Access the Movie Plot Synopses dataset on Kaggle through the provided link.
Download the Dataset: Download the dataset in CSV format using the available option on Kaggle.

Dataset Contents

IMDb ID: Unique identification for each movie, facilitating cross-referencing and analysis.
Movie Title: Inclusion of movie titles for easy identification and categorization.
Plot Synopses: Detailed plot synopses provide a rich source of textual data, capturing the narrative essence of each film.

Text Preprocessing

Creating Processed Corpus

Tokenization: Break down each synopsis into individual words or tokens, forming a basis for analysis.
Lemmatization: Reduce words to their base or root form using the NLTK library for lemmatization, considering contextual meaning.
Stopword Removal: Utilize the NLTK stopwords list for English to remove common words that do not contribute significantly to meaning.

Utilized Word Embedding Techniques

Bag of Words (BoW):
- Represents each document as an unordered set of words, disregarding grammar and word order.
Term Frequency-Inverse Document Frequency (TF-IDF):
- Weighs words based on their frequency in the current document against their frequency in the entire dataset.
Continuous Bag of Words (CBOW):
- Neural network-based word embedding model that captures word semantics.

Similarity Analysis Using Cosine Similarity

Cosine similarity metric is employed to quantify the degree of similarity between movie plot synopses, producing three distinct similarity scores corresponding to BoW, TF-IDF, and CBOW embeddings.

... | ... | ...

Among the trio, the CBOW model emerged as the frontrunner, showcasing superior performance in capturing subtle nuances and contextual intricacies within the movie synopses.

Noteworthy examples include "Iron Man" and its sequels, where CBOW achieved impressive similarity scores, such as 93.01% for "Iron Man" and 87.84% for "Iron Man 3."

GitHub Repository

[GitHub Link](https://github.com/Vikas-ABD/Movie_Similarity_Analysis_Recommendation .git)

Instructions

Clone the Repository:

git clone https://github.com/Vikas-ABD/Movie_Similarity_Analysis_Recommendation

.git ```

Install Requirements:
```
pip install -r requirements.txt
```
Run the Notebook in Google Colab: Execute the notebook to generate similarity.pkl and data_frame.pkl files.
Create Streamlit Web Application:
- Utilize the generated files for creating a user-friendly Streamlit web application.
- Extract movie images from the IMDb site through API.
Run the Web Application:
```
streamlit run app.py
```
Access the web application locally for exploring movie recommendations.

Download required file from the file download_required_files.txt

web app working demo:

web app created using streamlit and we can deploy that in the cloud also .

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
Assignment_report.docx		Assignment_report.docx
README.md		README.md
TF-IDF.png		TF-IDF.png
app.py		app.py
archive.zip		archive.zip
bow.png		bow.png
cbow.png		cbow.png
corpusl.pkl		corpusl.pkl
demo_streamlit_app.png		demo_streamlit_app.png
download_required_files.txt		download_required_files.txt
movie_similarity_analysis.ipynb		movie_similarity_analysis.ipynb
movie_vectors.pkl		movie_vectors.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Movie Similarity Analysis and Recommendation

Assignment Overview

Data Collection

Dataset Contents

Text Preprocessing

Creating Processed Corpus

Utilized Word Embedding Techniques

Similarity Analysis Using Cosine Similarity

GitHub Repository

Instructions

Download required file from the file download_required_files.txt

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Vikas-ABD/Movie_Similarity_Analysis_Recommendation

Folders and files

Latest commit

History

Repository files navigation

Movie Similarity Analysis and Recommendation

Assignment Overview

Data Collection

Dataset Contents

Text Preprocessing

Creating Processed Corpus

Utilized Word Embedding Techniques

Similarity Analysis Using Cosine Similarity

GitHub Repository

Instructions

Download required file from the file download_required_files.txt

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages