- By Meng Xuan Xia
- 1 comment
In this special edition of SweetIQ’s Dev Desk column, staffer Meng Xuan Xia combines an unlikely trifecta: sentiment analysis, the Dollar Shave Club’s “One Wipe Charlie” product, and consumer reviews. Whether you’re a developer, fan of the marketing genius behind DSC, or you just get a kick out of hilarious online reviews — read on to see how Meng’s orginal hypothesis unfolds.
Hey, everybody! Meet Andy. Andy is a Distributed System Engineer here at SweetIQ. Andy and his developer friends are confronted with all sorts of challenges in their aim to implement distributed systems correctly; messages get delivered out of order, misdelivered, or delivered multiple times. Like many other wise engineers, Andy relies on a “publisher subscriber message broker” to handle the data-rich message queues. But, sometimes the queue gets stuck on production — and has to be fixed, right then and there. The naive “quick fix” solution is to wipe the message queue so that the entire system restarts in a clean state.
But it’s not always that simple.
Furious at this problem, Andy started making jokes about the challenges a developer encounters when vying for a clean slate. In his state of fury, he turned to the good ol’ Internet for support and answers. The Internet’s idiosynchrasies work in strange and beautiful ways, and Andy’s path lead him to rediscover this viral video campaign from the Dollar Shave Club guys.
The product in question is called One Wipe Charlie – Buttwipes made for men. And wow, those guys have some seriously good reviews.
At the time of my “research exploration”, I was working on some sentiment analysis experiments over restaurant reviews, aiming to provide our users a nice tool to help them understand how their customer thinks about their businesses or products. Working with restaurant review data is manageable; it’s well studied and well understood. But, at the same time, it can be dry and boring. So, naturally, I turned to the funny reviews of the One Wipe Charlie product to become my main data set. All in a day’s work, people!
In academia, sentiment analysis has been studied for decades. Scholars mainly try to use computing to categorize opinions expressed in text such as reviews, comments and so on.
The traditional methods are usually simple, employing lexical features, like counting the number of positive words and negative words in a review. If there are more words like awesome, good, and exceptional than awful, bad and horrifying, in a review, then we would say that the review is positive, and vice versa.
But there are clear limitations with these lexical methods.
Say I’m a restaurant owner, and I know my restaurant has a generally negative review. But exactly which part am I doing wrong? Is it that my fries are too greasy? Is my serving staff rude? Did they complain about the cleanliness of the space?
The need for these more complex answers motivates the development of a branch of more detailed sentiment analysis, called Aspect-Based Sentiment Analysis. This industry buzzword is actually an on-going research field. And remember this word, because that’s going to be central to our entire story here.
And, incase you’re not a business owner and you’re like “phhhh, reviews! Who cares!” Just know that reviews and online review management in general are essential for any business’ online presence with an active pulse.
So here we are, we have amazingly positive reviews from an unlikely product. We have sentiment analysis that needs to be improved for our clients. How to solve the problem? Like any good dev researcher, we ask questions:
Why exactly do those people like the One Wipe Charlie product?
(Writer’s note: Right now as I write this article, I’m still not sure if this one would get published to our blog. I’m almost certain that Colleen was expecting a post similar to the stories like Schemats – statically typed postgresql query with typescript. But when you ask a developer to write an technical article on his day-to-day… well, anything can happen.)
The journey begins
One Wipe Charlie has over 2,300, overwhelmingly positive reviews. We don’t really have the free time to go through all of them. But we are still curious to understand what people like about the wipe, and most importantly, for those that hated it, what exactly did they hate about it. To understand that, we need to cluster each sentiment into senti-aspects.
Imagine this: you are a store manager who receives letters from your customers, and you need to organize these letters into different buckets. You have gold buckets A and B, which are housing positive letters things about your store. The gold bucket A contains the letters that say the good qualities of your products. The gold bucket B has those that describe your superb customer service. Those buckets go into the truck and get sent to your boss. There are also the paper buckets, they are meant to hold the letters describing how awful things are. Those buckets… They ain’t going any where.
Having aspect-based sentiment analysis is basically having those letters put into the correct buckets for you, automatically.
Getting a stronghold on all of the review data is easy, we just need to open the developer console and observe how the review data get loaded via XMLHttpRequests and then produce a script to automatically download all the reviews.
Here, I rightfully deprive the reader’s joy of discovering the xhrs on their own by providing an already made script.
#!/bin/bash for i in `seq 24`; do limit=100 offset=$((( $i - 1 ) * 100 )) curl "https://api.bazaarvoice.com/data/reviews.json?apiversion=5.4&passkey=p50vk4tqemhkpchqh91467s6v&Filter=ProductId:OWC-40C-3&Include=Products&Stats=Reviews&Sort=IsFeatured:desc,SubmissionTime:desc&Offset=$offset&Limit=$limit" -o charlie_wipe_review.part$i.json done
This will download all of the charlie reviews in json format. But json is not a particularly friendly format for data analysis. So we proceed to convert them into Tab Separated Values using this overly indented python script.
#! /usr/bin/env python import json with open('charlie.tsv', 'w') as g: for i in range(1,24): with open('charlie_wipe_review.part%s.json' % i) as f: j = json.load(f) results = j["Results"] for r in results: review_id = r['Id'] rating = r['Rating'] text = r['ReviewText'].replace('\n', '') g.write(review_id + '\t' + review_id + '\t' + 'charlie' + '\t' + str(rating) + '\t' + text + '\n')
This will save all 2300 reviews as a single tsv file.
Modeling the Review Space
In the next step, I proceed to build an unsupervised Aspect Sentiment Unification Model with the reviews data as input, following the algorithm proposed by Yohan Jo and Alice Oh (2012). Unfortunately, I can’t share any of the source code for the moment so you, dear reader, will have to use your imagination.
#! /usr/bin/env python import imagination import closedsourcedlib with open('charlie.tsv') as f: closedsourcedlib.build_the_model(f)
With the produced model we are able to cluster the sentiments expressed in different sentences in reviews into many buckets of senti-aspects. Each bucket shows the most frequent words that belongs to review sentences that represents the senti-aspects.
The algorithm we used can give a list of the top most probable words that are present in each senti-aspect bucket. Let’s take a peak at those words
|Gold Bucket 1||Gold Bucket 2||Paper Bucket 1||Paper Bucket 2|
|clean (0.101)||wipe (0.053)||pull (0.072)||s (0.029)|
|fresh (0.070)||good (0.039)||wipe (0.064)||like (0.029)|
|feel (0.059)||product (0.028)||package (0.059)||wipe (0.029)|
|wipe (0.047)||like (0.025)||time (0.044)||charlie (0.025)|
|leave (0.043)||little (0.022)||difficult (0.042)||smell (0.025)|
|feeling (0.039)||great (0.022)||hard (0.033)||peppermint (0.021)|
|love (0.033)||price (0.020)||end (0.022)||bad (0.017)|
|smell (0.025)||buy (0.017)||sheet (0.022)||winner (0.017)|
|minty (0.023)||store (0.015)||problem (0.020)||charlies (0.017)|
|nice (0.021)||use (0.014)||edge (0.020)||work (0.017)|
|scent (0.021)||big (0.014)||complaint (0.018)||fresh (0.017)|
|peppermint (0.020)||feel (0.013)||issue (0.018)||go (0.017)|
Take a look at the most probable words in each bucket above and it’s very likely that you can guess what most reviewers said.
Gold Bucket 1 contains the reviews that talk about the wipe cleans well and leaves a nice minty scent. Some of the representative reviews in this bucket are the following:
“I love these wipes! And because of the peppermint, it makes you feel even fresher!”
“This product made me feel… Like dancing… I’m kidding. Feels clean and fresh. Works as described. P.S. I wrote the clean and fresh thing before scrolling down and seeing it”
Gold Bucket 2 contains reviews that felt the price was right for this good product. Some of the representative reviews in this bucket are the following:
“Great products at a low price, no hassle of going to the store. On time deliveries.”
“Great quality. Great price. Just great. Not sure why I never used something like this in the past but now I can’t live without. Well actually I can live without but really don’t want to.”
How about the negatives? Paper Bucket 1 describes problem with the packaging.
“While I like the premise, smell, and ultimate utility of the wipes it is often (quite often) extremely difficult to remove a wipe from the package. This often triggers a frustrated “scrape” of the top of the stack which results in a clump of wipes coming out of the package. After several packages I have stopped buying for this reason. Let me know if you get this figured out and I’ll give them another try.”
“Only criticism is they are very difficult to pull out of the package initially (for the first 10 sheets or so).”
Not everyone likes the peppermint, Paper Bucket 2 also contains the highly frequent keyword peppermint. Let’s see why people didn’t like peppermint:
“When I first ordered these wipes they seem to be very “pepperminty”. They used to feel so cool and refreshing, now there seems to be less peppermint. They feel like normal wipes now.”
Andy says: “You should have a positive attitude towards negative reviews!”
As it stands, the local search industry treats bad reviews as threats that need to be tended to within the first minute they go live; usually in the form of apology or denial. But our research discovers that negative reviews can be a great opportunity to engage with consumers in a meaningful, productive way. Goodbye to the days of a knee-jerk, write-it-and-forget-it response.
So when Andy’s confronted with a problem again… will he just wipe the queue and start over!? No. Being able to process and classify all types of reviews — and their particularities — is key for our clients. As for One Wipe Charlie? Well, we thank the Internet for bringing our attention to a true classic in the viral video world, and keep up the great work, Dollar Shave Club!
Find out how SweetIQ can help your brand win the review game.
Disclaimer: This article was not sponsored by Dollar Shave Club.