Let's science the sh*t out of Toilet Paper Reviews
Toilet papers. A trivial household item, but touches us in our everyday lives. Amazon.com lists over 50 brands of Toilet Papers. With over 9000 product listings, how exactly does one pick the right toilet paper - hopefully in a quantified, scientific way?
Let us look at the possible methods of evaluation:
- Physical / Mechanical tests
Physical / Mechanical Tests
These often involve analyzing the physical characteristics of TP using tests such as the following:
- Water absorption test
- Paper strength using mechanical puncturing device
- Tearing tests
The good part about such types of evaluations is that they can be completely objective. The tests (if conducted properly) can provide objective and precise measurements.
The issue with this kind of evaluation, though, is that it is often impossible to arrive at tests that can grade each and every aspect of a product or incorporate actual consumer experiences - for example. what kind of a test can be used to quantify “comfort” ?
A possible solution to the above problem, is to perhaps involve the consumers in the grading process - after all who better to measure the product than the end users of the product itself!
Surveys work great if designed right. However, designing the right survey to incite unbiased responses can be challenging due to factors like Response Bias. Even if we were to be able to come up with a questionnaire that records participants true responses across all aspects of the product, keeping it updated over time can get challenging.
Reviews are informative nuggets of free text and are already available for most products that are sold on the internet. They are the voice of the consumers in a public forum.
” People often talk about the things they care about “
… and the same is true for online reviews.
By itself, a single review represents an individual’s experience - but these become even more interesting when we start to look at repeated patterns across many reviews.
What if we could use this information to scientifically come up with conclusions about the product?
Using Neural Networks + Natural Language Processing, it is possible to extract the topics and sentiment, enabling us to quantify objective metrics for the product.
Now let’s try and analyze TP reviews from Amazon.com.
After crawling Amazon.com to gather reviews, we need to preprocess the data to address the following issues:
- Unverified reviews - These are reviews by people who have not bought the product from the marketplace. Since there is no evidence that these buyers are genuine, we do not consider them.
- Old reviews - Consumer expectations and products change over time. Keeping this in mind, we remove reviews that are more than a couple of years old.
- Review spam - many brands and sellers create “fake” reviews or incentivize reviewers by giving away free samples of their products. This results in significant biases creeping into the review data of a product. We can mitigate this using outlier detection on the reviewer history to identify products that have significant spammy reviews.
- Product variants - Some websites like Amazon group reviews of a bunch of products into the same listing - thankfully, Amazon does provide a (somewhat hidden - “show only this format”) option to filter the reviews by variant.
- Too few reviews - it is difficult to draw meaningful conclusions based on a small number of reviews, hence for this blog post, we discard products with fewer than 200 reviews.
After the above clean ups, we end up with 29574 reviews across 274 products (only considering upto 1000 most recent reviews for each product).
We can now cluster the review sentences to extract the most spoken about topics in the reviews. Following are the recurring topics that start to emerge:
|TOPIC CLUSTERS||SAMPLE PHRASES|
|Overall||"the product was great", "did the job" ...|
|Marketplace / Delivery||"was delivered on time", "doorstep delivery" ...|
|Paper Quality||"strong", "durable", "doesn't tear" ...|
|Comfort||"soft", "cushy", "softest TP out there", "little ruff on the bum" ...|
|Value For Money||"worth the price", "bargain", "rip off" ...|
|Lasts Long||"lasts forever", "ran out in a day", "larger than expected" ...|
|Cleaning||"squeaky clean bum", "leaves quite a bit of fuzz behind" ...|
|Clogging||"clogs the sewer line", "gentle on the plumbing", "had to use the plunger" ...|
|Dispenser||"barely fit my holder", "might not fit in your dispenser" ...|
|Gag Gift||"gag gift for my roommate", "gift as a prank" ...|
|Eco-Friendly||"environmentally friendly", "made of recycled paper" ...|
|Travel||"carry in your purse or pocket", "great for travel" ...|
|Perfumed TPs||"smell is so strong", "smell soooooo good" ...|
For the purpose of this exercise, we will not consider the top two topics in the above list - reason being that the “Overall Product” statements are too generic, loosely used and volume-wise dominates every other topic, and the “Marketplace / Delivery” topic is not directly related to the product. We discard the (bottom four) low volume clusters as well.
Let’s look at the final set of clusters with their volume distribution:
All the cluster sentences are then passed through our custom sentiment classifier to arrive at the positive and negative polarity. This gives us a topic-wise polarity of the products. We can now start to compare topics across products to arrive at an overall product ranking.
Note that we not only have a ranking, but also a justification - for example the worst product is a Tubeless TP, which seems to have a lot of holder / dispenser related compaints.
Why stop at just toilet papers?
We at TheReviewIndex.com believe that similar techniques can be used for analyzing all kinds of reviews - which is why we’ve just launched a tool that creates spam filtered review summaries for any Amazon.com electronic / gadget / appliance.
You can try it out here: