-
Notifications
You must be signed in to change notification settings - Fork 3
Description
One of the biggest challenges we faced throughout our Labs experience was finding a way to best evaluate our models performance.
If our model was to take a pure collaborative filtering approach, which is based solely on user reviews, we would have been limited to only the 10,000 books in the Goodbooks dataset. Recommendation, more often than not, fell outside this dataset, causing no recommendations to be generated. We considered this approach too narrow.
However, one advantage to recommendations based on user reviews, is this turns our modelling approach into a regression problem, where you can take a train-test split/CV and where standard evaluation metrics (MAE, RMSE) are applicable. It may be possible to find additional user review data, beyond the Goodbooks dataset (see here and here).
Setting aside user review data, it wasn't clear to us at first how to generate these evaluation metrics, given we do not have a clear indicator of "success" (or dependent y-variable). As we found out, this is the main challenge of unsupervised learning.
We opted to survey the other members of our team for books they had read, then manually generate recommendations, and then create surveys for feedback on different model iterations. This was very time intensive and does not scale well. Creating an automated way to evaluate the recommendations and generate feedback would be the first thing I would do, if I was starting Labs over again. I experimented briefly with a rough HTML form to do this, but ran out of time to implement it (see below). Another idea for generating feedback that was suggested was to implement a "Tinder style" approach, where users can swipe right or left.