Tutorial for feature-level sentiment analysis

Heads up!

This tutorial was made for Textalytics and as such, it has become obsolete. You can read the updated version for MeaningCloud in this post.

MeaningCloud provides an API to carry out advanced opinion mining, Sentiment Analysis, which extracts both a global aggregated polarity of the text and a more in-depth analysis, giving a sentence-level breakdown of the polarity, extracting entities and concepts and the sentiment associated to each one of them.

Cover for Marvel's Black Widow #1

Marvel’s Black Widow #1

What makes MeaningCloud Sentiment Analysis API different is the possibility of defining entities and concepts for each call of the API, allowing you to obtain the same detailed sentiment analysis for entities or concepts specific to the domain of your application.

We are going to use comic book reviews to learn how to use this feature, as it’s a very rich domain in which it’s easy to illustrate how useful user-defined concepts and entities can be. This applies either to this field or to others where sentiment comes into play, such as hotel reviews, Foursquare tips, Facebook status updates or tweets about a specific event.

How do we do this?

The first step is, of course, to register for free in MeaningCloud— if you haven’t already — and request a license for the Sentiment Analysis API.

Once you have the key (you can always check it in your personal area), you have everything you need to take a first look at the results Sentiment Analysis provides. There are two ways you can do this: by using the test console provided, or by downloading a starting client in your favorite language from the SDK section.

The following script  is an example of how a basic call to the API would be done in PHP:

In the response received (printed in the previous script), there are three main types of data:

  • Global analyses of the text, which include polarity, irony and subjectivity detection.
  • In-depth analysis at sentence level, with a detailed breakdown of the keywords found and the entities and concepts in each sentence (referred to as ‘segments’).
  • Aggregated analysis of the sentiment associated to the entities and concepts found in the text (feature-level sentiment analysis).

This post focuses on this last part, which is what in a sentence such as “It’s not a horrible book by any means, it’s merely the storyline that’s stunningly mediocre.” allows to identify that the polarity associated to “book” is different from the one associated to “storyline“.

Dealing with entities specific to your own domain

Secret identities are par for the course in the comic book domain, which in terms of name recognition means that, to detect correctly all the references to an entity, all the possible aliases of a characters have to be known. Comic book characters are counted in the thousands, so finding an API whose resources contain this kind of information is not very probable. The following text has been extracted from a review of Black Widow #1:

I could go on and on about this book. This was everything I wanted in a BLACK WIDOW comic. Nathan Edmondson comes out of the gate running. He has a great take on Natasha and reading this issue will immediately make you want more. Phil Noto’s art is insanely good. He creates a fantastic mood full of energy and makes Natasha look great without over sexualizing her. This is a comic anyone can easily dive into. Buy an extra copy or two and give them to your friends or loved ones. This is the book Black Widow and comic fans deserve. How many days until issue two?

In this text we have the example of Natasha Romanoff, code name “Black Widow”. The following is the sentiment analysis associated to entities that we’d obtain with a basic call:

ENTITIES – Polarity [weight]
—————————-
Natasha – P+ [0.80]
Phil Noto – P+ [1.00]
Black Widow
Nathan Edmondson

Black Widow and Natasha are detected as two different entities, and in a scenario where what we want to find is the aggregated polarity of the entity, we need to be able to detect all its appearances. With this in mind, Sentiment Analysis gives the possibility of specifying entities and their aliases in the input through a parameter called “entities”. In this case what we want is to know that Black Widow and Natasha Romanoff are the same person, so we’d add the following line to the script we included previously:

$parameters[‘entities’] =  “Natasha Romanoff|Natasha|Black Widow”;

Pipe characters are used to separate the different aliases, and the first one in the list will be the representative.

 

Using this new parameter, the script will return all the entity appearances unified, and their aggregated polarity calculated:

ENTITIES – Polarity [weight]
—————————-
Natasha Romanoff – P+ [0.80]
Phil Noto – P+ [1.00]
Nathan Edmondson

Analyze the sentiment of your own aspects

The same thing can be done with concepts. In this case we want to check the review of Secret Avengers #16, and what it says about different aspects of the comic:

This was one of the most intriguing, dynamic storylines of the past year and it even had the decency to resolve itself in a satisfactory, satisfying way. The characters are hardly unscathed, but most of them may never remember exactly how they were scathed, as it were. There’s an amazing image of Maria Hill near the end of this issue where she gets a closeup that sums up so very much of this title. She knows what she has to do, but she hates that she has to do it. That’s the thinking that permeated the entire run.

Again, if we just carry out the basic call to Sentiment Analysis, we’ll see the sentiment associated to the basic concepts detected:

CONCEPTS – Polarity [weight]
—————————-
character – P [0.60]

In the same way we did for entities, Sentiment Analysis provides an input variable to specify from which concepts we want to extract the polarity. This feature is especially useful when we are working in domains where the concepts we are interested in are too generic to appear in the basic extraction (such as “image” or “issue” in this example), or in those domains imbued with technical language not contemplated by most APIs (for instance, the medical field).

The following line is what we’d need to add to the script to define three new concepts in our sentiment analysis:

$parameters[‘concepts’] = “issue\r\nstoryline|arc\r\nimage”;

Each concept will be in a different line, and synonyms can be defined using the pipe character.

And this is the result we obtain:

CONCEPTS – Polarity [weight]
—————————-
storyline – P+ [0.68]
character – P [0.60]
image – P+ [0.80]
issue

The same methodology we’ve applied to detect superheroes and different concepts specific to comics can be used to analyze easily and in-depth any domain where the objective is to extract information from a limited set of resources. A very common example would be analyzing cell phone reviews, where we’d have to add as entities the different models we are interested in, and the different features (screen, battery, etc.) as concepts.

Now you have both the tools and the knowledge to extract sentiment in any domain!


Leave a Reply

Your email address will not be published. Required fields are marked *

*
*