The main goal of this project is to develop an AI-based book recommendation system that can suggest similar books to users based on a given book title. The purpose is to enhance the book discovery experience by leveraging deep learning and hybrid recommendation techniques.
Dataset link: https://www.kaggle.com/datasets/mdhamani/goodreads-books-100k/
This project aims to improve upon an existing book recommendation system by integrating an AI model that suggests books from a large dataset. The recommendation system combines traditional content-based filtering with neural collaborative filtering (NCF) and convolutional neural network (CNN) models to enhance recommendation accuracy.
-
Data Preprocessing:
- Cleaned and prepared the dataset by handling missing values and encoding categorical data.
- Generated TF-IDF vectors for book descriptions and computed cosine similarity for content-based filtering.
-
Model Training:
- Implemented and trained a Neural Collaborative Filtering (NCF) model using user and book embeddings.
- Implemented and trained a Convolutional Neural Network (CNN) model to predict book ratings based on descriptions.
-
Hybrid Recommendation System:
- Combined the results from cosine similarity, NCF, and CNN models to provide a robust recommendation system.
-
Deployment:
- Created a Streamlit application to provide an interface for users to get book recommendations.
-
Cosine Similarity: Used for content-based filtering based on book descriptions.
- Chosen for its simplicity and effectiveness in measuring similarity between text vectors.
-
Neural Collaborative Filtering (NCF): Used for collaborative filtering based on user-book interactions.
- Chosen for its ability to learn user and item embeddings that capture latent factors in the data.
-
Convolutional Neural Network (CNN): Used to predict book ratings based on textual descriptions.
- Chosen for its ability to capture complex patterns in text data through hierarchical feature extraction.
- pandas
- numpy
- scikit-learn
- torch
- tensorflow
- keras
-
Neural Collaborative Filtering (NCF) Model:
- Test MAE: 0.2378
-
Convolutional Neural Network (CNN) Model:
- Test MAE: 0.2390
-
Cosine Similarity:
- Provides top-N recommendations based on content similarity.