This project builds a recommendation system for Netflix titles using embeddings generated by a all-mpnet-base-v2
model and an similarity search index built with FAISS.
The project takes a CSV file (by default, netflix_titles.csv
) containing information about Netflix movies and TV shows, preprocesses the fields to create a representative text for each title, and then:
- Generates Embeddings: Uses a
SentenceTransformer
model to convert the representative text into numerical vectors. - Builds a FAISS Index: Indexes the generated embeddings using FAISS, enabling similarity search.
- Interactive Recommendation System: When a user enters a query in natural language, it converts the query into an embedding and retrieves the most similar titles based on their content.
The CSV file is sourced from Kaggle. You can find the dataset here.
- Text Processing: Cleans and normalizes the fields of each title.
- Embedding Generation: Uses the Sentence Transformers library to obtain vector representations of texts.
- FAISS Index: Utilizes FAISS to index and search through large collections of vectors.
- Interactive Recommendations: Allows users to input a natural language query to receive recommendations based on content similarity.
-
Clone the Repository: Start by cloning the project repository.
-
Install Python: Make sure Python is installed on your system. You can download it from the official Python website.
-
Install Conda: Ensure Conda is installed on your system. If not, download and install it from Miniconda or Anaconda.
-
Install Dependencies: Set up the environment using the provided
environment.yml
file.
conda env create -f environment.yml
conda activate embeddings-recommender-netflix
- Run the Script: Execute the main script to generate text strings, embeddings, FAISS index, and run recommender.
python main.py