RecSys Week 1: This post is a brief summary and personal commentary of the blog post ‘How Not To Sort By Average Rating’, written by Evan Miller, and available here. This blog was written as a weekly read for the Recommender Systems (IIC3633) course, Pontifical Catholic University of Chile.
What is better? Some Bluetooth headphones with just one rating of 5 stars, or another with over 5000 ratings averaging 4.5 stars? I’d prefer the second option, backed up by lot of users, meanwhile the first headphones were probably only rated by their producer.
A lot of people will find this logical, but then, why will the algorithms of a web page show me first the unvalidated headphones? This is because they weren’t design to take on account the number of ratings given, and for most cases, that is a terrible decision.
Normally, to consider the amount of sales, a probabilistic function with a confidence interval is used. Like in a country’s electoral prediction, from a sample of the population it is possible to make an estimate of the intervals in which the results will be, with a confidence level. For example; “With a confidence of 95%, Rico McPato will get between 30 and 35% of the votes”, that means that there is a 5% chance that the results will be outside of the interval given.
To incorporate probabilities in items rating, normally we take the lower bound of the interval, the lower the sample size (the amount of ratings in this case) the broader the range. So in the example given in the beginning, the headphones with just one rating could have a lower bound of just 1 star.
Now, I think the author does not talk about an important point, related to cold starts. This happens when a new product arrives, a new user registers on a web page, users just don’t rate the items, and other cases. In these situations discovering novel products will become difficult, and that is mostly unwanted by the providers.
So, besides from considering the significance of the ratings a product has, a developer needs to have in mind a system to generate recommendations of new item, which could be influenced by the addition date, the producer, the characteristics (which is called content-based filtering), and maybe even get reviews from other sources. For a news website it would be terrible not to propose recommendations of new articles that come up daily, and are the main reason an user visits a news site.
Thank you for reading. Any correction, commentary, related reading I will thankfully accept. With this work I do not intend to misinterpret or point as bad the original work on which most, if not all, of the contents of this post were obtained.