An analysis of sentiment analysis

Go Fish
Sentiment analysis is becoming the topic du jour (note the italics for the foreign word? Thanks Ben) in social media circles. It’s even hit the mainstream with interest in how brands are using data mining for commerical advantage . The semantic web is most definitely at our doorstep yet we seem to be expecting our machines to understand complex human emotions when even men don’t seem to realise that when a woman says she’s fine it most certainly does not mean she’s fine.

Take this example, from the (comprehensive) Digital Media article that outlines where to get your election news online. Of the six articles listed as “Hot”, one is a remix of Gillard’s Moving Forward catchphrase and one is a plea for Tony Abbott to man up. A third should have been listed as neutral as the Twitpic of Bob Brown merely stated his whereabouts (and his company).

Add to this our habit of using litotes, tropes and double negatives and it’s obvious that sentiment analysis will need to rely on more intelligent text processing than we currently have at our disposal. We’re trying to combine a lexicon with an ontology hierarchy and human emotion/reason with categorical attributes. So what’s the solution?

Most solutions stink. Not just stink… dinosaur’s breath after a meal stink.We are algorithmically trying something that as yet does not lend itself to algorithmic measurement… “emotion”. It is darn near impossible to cleanly buckets feelings and nuance into clean Positive, Negative, Neutral buckets.

We, computer programs, are simply not there yet. [Though I am absolutely confident that we will get there at some point.]

For now you are most likely wasting time (and money). Sorry.

…Rather than trying to find short cuts, where none exist, and provide aggregate data, where it just gets crapified, follow a well established methodology while leveraging segmentation and nuance.

Source: Occam’s Razor by Avinash Kaushik (read the rest of the article to find out how Avinash knows how many people have shared his words of wisdom!) (In fact, read the rest of the article anyway and subscribe to his blog. You won’t be disappointed.) (Ok well maybe you’re the type to be disappointed but that’s not really my problem is it? Seriously, stop being a jerk.)


Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to FurlAdd to Newsvine


~ by mandi bateson on July 27, 2010.

4 Responses to “An analysis of sentiment analysis”

  1. Agree – there’s a long way to go here. And you know what, the cost of an intern to run an eye over your keyword reporting is likely to be a much better investment in the longer term than the licensing of some software. +1 for the humans 😉

  2. Agreed. Language is one of the most fluid forms of expression in that it constantly changes. Sure in the future it may be possible, but this would entail teaching a computer to learn, analyze and adapt to changes in language, not discounting the fact that there are about 6900+ different languages in the world, (not counting slang, etc). It’s easy to say, “we should automate this”, another matter to actually find a way to do it, perfect the method and sell it for a reasonable price. At our company, although we promote sentiment analysis, we do not remove humans from the equation. Computers are a brilliant thing but we have yet to find anything that can surpass the human mind.

  3. […] personal favourite was a post inspired byAvinash Kaushik from Occam’s Razor which is an analysis of sentiment analysis because of what it got me thinking about language and the human […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: