# Movie Recommendation System
This project uses a dataset of movies, user ratings, and other metadata to generate movie recommendations. The recommendations are generated using a combination of TF-IDF vectorization, K-means clustering, and the OpenAI GPT-3.5 API.
## Datasets
The following datasets are used in this project:
- `genome-tags.csv`
- `genome-scores.csv`
- `ratings.csv`
- `tags.csv`
- `links.csv`
- `movies.csv`
## Prerequisites
Ensure you have the following Python packages installed:
- pandas
- scikit-learn
- requests
- openai
You can install the required packages using the following command:
```bash
pip install pandas scikit-learn requests openai
Replace YOUR_OPENAI_API_KEY
in the main
function with your actual OpenAI API key.
- Ensure you have all the required datasets in the specified paths.
- Run the script by executing the following command:
python movie_recommendation_system.py
Loads the necessary datasets for the project.
Preprocesses the movies dataset by removing movies with no genres listed and merging with the links dataset.
Performs TF-IDF vectorization on movie titles and genres and applies K-means clustering.
Selects one movie from each cluster.
Uses the OpenAI API to generate movie recommendations based on the selected movies.
Checks if the predicted movie is present in the dataset.
Maps predicted movie titles to their respective movie IDs.
Main function to execute the workflow of loading data, preprocessing, clustering, selecting movies, getting recommendations, and calculating percentages.
The final output is saved to output1.csv
which contains the recommended movies and their respective ratings.
- The script calculates the percentage of users whose ratings for the recommended movies are within a threshold of 0.5 from their original ratings.
- Ensure the OpenAI API key is kept secure and not shared publicly.