Before setting out on a quest to discover customer expectations, preferences, and aversions, you must know your allies, enemies, and weapons.
But, be on guard, since dangers lurk in the horizon…
Share this Image On Your Site
Customers talk. They have voluntary, passionate, and sincere conversations in forums, social networks, instant messaging appsts, and elsewhere. Listening to them, understanding what they mean, and acting upon that knowledge is directly related to improving the user experience and, as a result, your organization’s profitability.
In recent years, the picture has changed radically. For starters, sources have multiplied. Furthermore, the conversation goes on 24 hours a day, 365 days a year. It is often multilingual and frequently overlooks the most basic spelling rules. It is natural language and as such, it is unstructured: it is neither stored in a traditional database nor organized according to predefined criteria.
If you are charged with analyzing the voice of the customer, you probably won’t have enough human resources to reach, read, classify, interpret, or extract value from such large volumes of data, let alone do it all in real time.
As a result, many of these valuable resources remain untapped.
Only an automatic or semi-automatic processing of the massive sources of unstructured data can adequately perform this analysis with the necessary quality, volume, timing, and consistency.
Why do you need your own spiders?
A number of commercial tools, such as Radian 6 or BrandWatch, offer access to several sources of text to analyze the voice of the customer. Some claim to access up to 80 million sites.
There are 3 problems with these all-purpose crawling solutions:
- Radian 6 and the likes are broad in scope but shallow. Their crawlers reach many places, but they are rarely able to scratch the surface of most web
- They hardly understand the needs of your business. They are all-purpose. What if you are interested in listening to the voice of video game players? Despite their broad approach, the out-of-the-box solutions are very unlikely to tap into the scores of sites where gamers actually voice their opinions.
- Forums are tough to crawl. Twitter, for example, is an easy source to extract information from. Most blogs are not overly complicated either. But many times, the most valuable source of the voice of the customer lies hidden in web-based forums. They are an entirely different story for crawling.
In our experience, if you are serious about the discovering the voice of the customer, you will need spiders with ad-hoc programming to reach what you really need to analyze.
Text chains are clearly insufficient. Use a semantic platform
Now that you have the texts, it’s time to analyze them. Text-chain based searches won’t work. Natural language is simply too tricky. Ambiguity, irony, misspellings and more make it very difficult. See: Recognizing entities in a text: not as easy as you might think!
3 examples demonstrate that you will need sophisticated tools to analyze natural language:
- Take the name “Washington”. It may refer to fairly well-known people (starting with George Washington), the state on the Pacific coast of the United States, the U.S. capital (Washington, D.C.), and quite a few other cities, institutions, and buildings in the same and other countries. It can even be a metonym for the federal government of the United States.
Semantic and contextual clues are needed for proper disambiguation. Are there any other references to the same name (maybe in a complete form) found in the analyzed text? Can semantic analysis tell us if we are dealing with a person (producing human actions) or a place (where things happen)? Can we confidently establish a geographical context for the text? This could also lead to favorite particular interpretations.
- Avoiding repetitions bring about further problems. In a given language, texts usually refer to the same entities in different flavors.
Consider “Nelson Mandela”, “ Mandela” (depending on the context), and ”Madiba” are recognized as the same entity by English speakers.
- Misspellings abound. If you find the misspelled word “Genva” in an English text, should you interpret it as Geneva (in French Genève) or Genoa (in Italian Genova)?
A semantic platform will deal with all those difficulties. The strategy to interpret the unknown word correctly(identifying the meaning intended by the author) implies using metrics for distance between the unknown word and other words that you can recognize as correct. In our example, if the text has been typed with a qwerty keyboard, it seems that the distance between Genva and Geneva involves a single deletion operation, while the distance between Genva and Genoa involves a single substitution using a letter that is quite far apart. So, using distance metrics, Geneva should be preferred. But contextual information is equally important for disambiguation. If our text includes mentions to places in Switzerland, or it can be established as the right geographical context, then Geneva gains probability. Otherwise, if the text is about Mediterranean cruises, Genoa seems to be a natural choice.
Sentiment analysis too
On the other hand, Sentiment Analysis (also known as Opinion Mining) consists of the application of natural language processing, text analytics, and computational linguistics to identify and extract subjective information from various types of content. Automatization sentiment analysis allows us to process data that cannot be handled efficiently by human resources due to their volume, variety, and velocity.
Cooking value from structured data
Once the unstructured natural language has been tamed, it is time to extract value from the sources. Dashboards, alerts or the capacity to make queries are the kind of tools that bring you closer to the happy ending, namely, to suggesting actionable insights that will improve the experience of your customers.