Anime Recommendation System based on Hybrid Filtering Model

This project presents a sophisticated anime recommendation system developed as part of the academic curriculum. The system leverages a hybrid filtering approach, combining content-based analysis with user-centric features to provide personalized and high-quality anime recommendations. The entire research process, from data exploration to model implementation, is documented in a series of Jupyter Notebooks.

Methodology

The project is methodologically divided into two principal phases: an in-depth exploratory data analysis (EDA) and the development of a hybrid recommendation model.

Data Exploration and Analysis

A comprehensive EDA was conducted to understand the underlying patterns within the anime and user datasets. This phase addressed several key research questions (RQs):

RQ1 & RQ2: Analysis of the distribution of anime ratings and their correlation with fundamental features such as genre, type, premiere year, and episode count. Key findings indicate that specific genres and anime formats (e.g., TV, OVA) exhibit distinct rating patterns.
RQ3 & RQ4: Investigation into user rating behavior, exploring rating distributions and potential biases related to user demographics (gender, age) and viewing habits (e.g., total days watched, number of completed series). The analysis revealed discernible differences in rating tendencies across different user groups.

These insights were instrumental in shaping the feature engineering and filtering criteria for the recommendation model.

Recommendation Model

To address the primary research objective (RQ5: How to effectively recommend high-quality animations?), a hybrid recommendation model was implemented. The model architecture integrates content-based filtering with collaborative and quality-based elements.

Content-Based Filtering:
- The core of the model relies on semantic understanding of anime content. We utilized a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model (bert-base-uncased) to generate dense vector embeddings from the textual synopses of anime.
- The Cosine Similarity metric is then employed to quantify the content-based similarity between different anime based on these embeddings.
Hybridization and Personalization:
- User Demographics: The model incorporates user age and gender for personalization. An age-based filter excludes mature content (R-rated) for younger users.
- Collaborative Element: Gender-based preferences are integrated by weighting similarity scores. The system adjusts recommendations based on the historical average ratings given by a user's gender to specific anime genres.
- Quality-Based Ranking: To ensure the quality of recommendations, a final weighted_score is calculated for each candidate anime. This score is a composite of its official Score, Favorites count, and Popularity rank, effectively prioritizing critically acclaimed and popular titles.

The final recommendation list is generated by consolidating candidates, removing duplicates and previously watched anime, and sorting them by the final weighted score.

Project Structure

The project's source code is organized into several Jupyter notebooks:

src/data_exploration_and_RQ1-3.ipynb: Contains the complete process of data cleaning, preprocessing, and exploratory data analysis corresponding to Research Questions 1-3.
src/RQ4.ipynb: Focuses on the analysis of Research Question 4, exploring the relationship between user behavior metrics and rating patterns.
src/anime_recommendation.ipynb: Implements the core hybrid recommendation system (RQ5), including BERT embedding generation and the final recommendation logic.
report/: Contains the detailed project report in PDF format.

Usage

To generate recommendations for a specific user, execute the recommend_anime function within the anime_recommendation.ipynb notebook.

# Example: Generate 5 recommendations for user with ID 20
recommend_anime(user_id=20, num_recommendations=5)

Prerequisites:

Ensure all required libraries specified in the notebooks are installed.
The bert-base-uncased model and tokenizer files should be located in the src/bert-base-uncased/ directory.
The datasets should be placed in the src/data/ directory.

Technology Stack

Data Manipulation and Analysis: Pandas, NumPy
Machine Learning and NLP: Scikit-learn, PyTorch, Transformers (Hugging Face)
Data Visualization: Plotly
Development Environment: Jupyter Notebook

基于混合过滤模型的动漫推荐系统

本项目旨在构建一个先进的动漫推荐系统，是为完成课程设计而开发。系统采用一种混合过滤方法，结合了基于内容的分析与以用户为中心的特征，以提供个性化、高质量的动漫推荐。整个研究过程，从数据探索到模型实现，均在一系列 Jupyter Notebook 中有详细记录。

研究方法

本项目在方法上主要分为两个阶段：深入的探索性数据分析（EDA）和混合推荐模型的开发。

数据探索与分析

我们进行了全面的探索性数据分析，以理解动漫和用户数据集中潜在的模式。此阶段解决了几个关键的研究问题（RQ）：

RQ1 & RQ2: 分析了动漫评分的分布及其与基本特征（如题材、类型、首播年份、集数）的相关性。主要发现表明，特定题材和动漫格式（如TV、OVA）表现出独特的评分模式。
RQ3 & RQ4: 调查了用户的评分行为，探索了与用户人口统计学特征（性别、年龄）和观看习惯（如总观看天数、完成的系列数）相关的评分分布和潜在偏好。分析揭示了不同用户群体在评分倾向上的明显差异。

这些洞见为推荐模型的特征工程和过滤标准制定提供了重要依据。

项目结构

项目的源代码被组织在几个 Jupyter Notebook 中：

src/data_exploration_and_RQ1-3.ipynb: 包含与研究问题1-3相对应的数据清洗、预处理和探索性数据分析的完整过程。
src/RQ4.ipynb: 专注于研究问题4的分析，探索用户行为指标与评分模式之间的关系。
src/anime_recommendation.ipynb: 实现了核心的混合推荐系统（RQ5），包括BERT嵌入生成和最终的推荐逻辑。
report/: 包含PDF格式的详细项目报告。

如何使用

要在 anime_recommendation.ipynb 中为特定用户生成推荐，请执行 recommend_anime 函数。

# 示例：为ID为20的用户生成5个推荐
recommend_anime(user_id=20, num_recommendations=5)

先决条件:

确保已安装 Notebook 中指定的所有必需库。
bert-base-uncased 模型和分词器文件应位于 src/bert-base-uncased/ 目录中。
数据集应放置在 src/data/ 目录中。

技术栈

数据处理与分析: Pandas, NumPy
机器学习与自然语言处理: Scikit-learn, PyTorch, Transformers (Hugging Face)
数据可视化: Plotly
开发环境: Jupyter Notebook

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
report		report
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Anime Recommendation System based on Hybrid Filtering Model

Table of Contents

Methodology

Data Exploration and Analysis

Recommendation Model

Project Structure

Usage

Technology Stack

基于混合过滤模型的动漫推荐系统

目录

研究方法

数据探索与分析

推荐模型

项目结构

如何使用

技术栈

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

yukito0209/anime-recommendation

Folders and files

Latest commit

History

Repository files navigation

Anime Recommendation System based on Hybrid Filtering Model

Table of Contents

Methodology

Data Exploration and Analysis

Recommendation Model

Project Structure

Usage

Technology Stack

基于混合过滤模型的动漫推荐系统

目录

研究方法

数据探索与分析

推荐模型

项目结构

如何使用

技术栈

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages