Streamist is a streaming company that streams web series and movies for a worldwide audience. Every content on their portal is rated by the viewers, and the portal also provides other information for the content like the number of people who have watched it, the number of people who want to watch it, the number of episodes, duration of an episode, etc.
They are currently focusing on the anime available in their portal, and want to identify the most important factors involved in rating an anime. Tasked with identifying the important factors and building a predictive model to predict the rating on an anime.
To analyze the data and build a linear regression model to predict the ratings of anime.
-
What are the key factors influencing the rating of an anime?
-
Is there a good predictive model for the rating of an anime? What does the performance assessment look like for such a model?
-
Data Collection & Preparation
- Imported the anime dataset.
- Cleaned missing data and standardized feature formats.
- Encoded categorical variables and scaled numerical values.
-
Exploratory Data Analysis (EDA)
- Visualized distributions of ratings, genres, and episode counts.
- Generated correlation heatmaps to identify relationships between features.
- Detected outliers and anomalies.
-
-
Model Evaluation
- Assessed performance using RMSE (Root Mean Square Error), MAE (Mean Absolute Error), and R² score.
Contributions, issues, and feature requests are welcome!
Feel free to open a pull request or issue.