Category Archives: Industries

This category groups the different industries for which MeaningCloud offers solutions.

Automatic IAB tagging enables semantic ad targeting

Our Text Classification API supports IAB’s standard contextual taxonomy, enabling content tagging in compliance with this model in large volumes and with great speed, and easing the participation in the new online advertising ecosystem. The result is the impression of ads in the most appropriate context, with higher performance and brand protection for advertisers.

What is IAB’s contextual classification and what is it good for

The IAB QAG contextual taxonomy was initially developed by the Interactive Advertising Bureau (IAB) as the center of its Quality Assurance Guidelines program, whose aim was to promote the advertised brands’ safety, assuring advertisers that their ads would not appear in a context of inappropriate content. The QAG program provided certification opportunities for all kinds of agents in the digital advertising value chain, from ad networks and exchanges to publishers, supply-side platforms (SSPs), demand-side platforms (DSPs), and agency trading desks (ATDs).

The Quality Assurance Guidelines serve as a self-regulation framework to guarantee advertisers that their brands are safe, enhance the advertisers’ control over the placement and context of their ads, and offers transparency to the marketplace by standardizing the information flowing among agents. All this, by providing a clear, common language that describes the characteristics of the advertising inventory and the transactions across the advertising value chain.

Essentially, the contextual taxonomy serves to tag content and is made of standard Tiers, 1 and 2 – specifying, respectively, the general category of the content and a set of subcategories nested under this main category – and a third Tier (or more) that can be defined by each organization. The following pictures represent those standard tiers.
Continue reading

Books Are a Service

Semantic Publishing and Voice of the Customer understanding for the media&content industry

The reason for publishing being a key industry to take advantage of text analytics is also the reason why the industry finds it so hard to engage with the technology.

Books are a serviceThe reason? Text. And a lot of it. The publishing world has struggled to understand how data relates to text and understand the value of data. This is changing, too slow for many, as the industry moves from seeing themselves as a ‘product’ based company (e.g. making books, e-books or physical) to a ‘service’ based company. In other words smart publishers are starting to see their service to customers as the creator and curator of information. This content is abled to be mixed and mashed-up in dynamic ways across a number of formats. This service is not bound, saddle-stitch or otherwise, to a specific product. This 180-degree perspective change requires publishers to think more directly about customer experience in the same way more traditional service based industries like hospitality or even retail banking.

Continue reading

Voice of the Customer in the insurance industry

For insurance companies, it is vital to listen and understand the feedback that their current and potential customers express through all kinds of channels and touch points. All this valuable information is known as the Voice of the Customer.  By the way, we had already dedicated a blog post to Text mining in the Insurance industry.

(This post is a based upon the presentation given by Meaning Cloud at the First Congress of Big Data in the Spanish Insurance Industry organized by ICEA. We have embedded our PPT below).  

More and more insurance companies have come to realize that, as achieving product differentiation at the industry is not easy at all, succeeding takes getting satisfied customers.

Listening, understanding and acting on what customers are telling us about their experience with our company is directly related to improving the user experience and, as a result, the profitability. In the post on Voice of the Customer and NPS, we saw in more detail this correlation between customer experience and benefits.


Continue reading

Text Analytics for Publishing: there’s metadata and smarter metadata

Everyone agrees metadata is great. It helps simplify the management and packaging of content and data. It creates consistency and provenance of your content and data across an organization. Metadata gives you that 35000 feet perspective that is needed to make strategic decisions. This is especially important for publishers whose stock in trade is human language, which is completely opaque to machines whose world consists of zeros and ones. Your customers aren’t calling or emailing you to know what is in such and such database. No. They are contacting you because they want to know what monographs you have by such and such professor or asking you for all the archival material on ‘cats’, ‘World War 2’ or ‘nanotubes’. As a human, you understand exactly what they are looking for. If your ICT has a smidgeon of metadata, you can dig around that such-and-such database and deliver the content and have a happy customer.

Intelligent content for Semantic Publishing

Metadata TagMetadata makes your content more intelligent. That’s why everyone agrees metadata is great. Great until they have to either enter the metadata or maintain the vocabularies. Some organizations are lucky. They have ensured there is support within the workflow and people with the expertise to do the hard work so when that customer searches on the website, they quickly find what they are looking for and go away happy. But, even those lucky few do not live in isolation. There is no publisher of consequence who doesn’t have do deal with 3rd party content and data. A huge amount of additional effort is spent shoehorning 3rd party content into the metadata models of the organization. Every publisher has a workflow that includes completely throwing away existing metadata and spending additional time and wasteful effort to add metadata that their CMS can handle. Does that sound familiar? Does it feel better to know you aren’t the only one?

Continue reading

#ILovePolitics: Popularity analysis in the news

If you love politics, regardless of your party or political orientation, you may know that election periods are exciting moments and having good information is a must to increase the fun. This is why you follow the news, watch or listen to political analysis programs on TV or radio, read surveys or compare different points of view from one or the other side.

American politics in a nutshell

American politics

Starting with this, we are publishing a series of tutorials where we will show how to use MeaningCloud for extracting interesting political insights to build your own political intel reports. MeaningCloud provides useful capabilities for extracting meaning from multilingual content in a simple and efficient way. Combining API calls with open source libraries in your favorite programming language is so easy and powerful at the same time that will awaken for sure the Political Data Scientist hidden inside of you. Be warned!

Our research objective is to analyze mentions to people, places, or entities in general in the Politics section of different news media. We will try to carry out an analysis that can answer the following questions:

  • Which are the most popular names?
  • Does their popularity depend on the political orientation of the newspaper?
  • Is it correlated somehow to the popularity surveys or voting intentions polls?
  • Do these trends change over time?

Before we begin

This is a technical tutorial in which we will develop some coding. However, we will try to guide you through the whole process, so everyone can follow the explanations and understand the purpose of the tutorial.

For the sake of generality and better understanding, we will focus on U.S. Politics in English, but obviously you can easily adapt the same analysis for your own country or (MeaningCloud supported) language.

And last but not least, this tutorial will use PHP as programming language for the code examples. However, any non-rookie programmer should be able to translate the scripts into any language of their choice.

Continue reading

Could Antidepressants Be the Cause of Birth Defects?

We agree that it is not typical at all for an Information Technology company to talk about antidepressants and pregnancy in its own blog. But here at MeaningCloud we have realized that health issues have a great impact on social networks, and the companies from that industry, including pharmas, should try to understand the conversation which arises around them. How? Through text analysis technology, as discussed below.

Looking at the data collected by our prototype for monitoring health issues in social media, we were surprised by the sudden increase in mentions of the term ‘pregnancy’ on July 10. In order to understand the reason of this fact, we analyzed the tweets related to pregnancy and childbearing. It turned out that the same day a piece of news on a study issued by the British Medical Journal about the harmful effects that antidepressants can have on the fetus had been published.
Continue reading

Exploring Social Media for Healthcare Data

People enjoy sharing information through social media, including healthcare data. Yeah, it is true! And it constitutes the starting point of the research work titled ‘Exploring Spanish health social media for detecting drug effects’, which aims at following social media conversations to identify how people talk about their relation with drug consumption. This allows identifying possible adverse effects previously unknown related to these drugs. Although there is a protocol to communicate to the authorities the identification of a drug adverse effect, only a 5 – 20% of them are reported. Besides, conversations around drugs, symptoms, conditions and diseases can be analyzed to learn more about them. For example, it is possible to see how people search for specific drugs using social media, while others sell them, perhaps illegally. Many others talk about mixing alcohol with drugs or other illegal substances. Of course, one cannot believe everything that appears on the Internet this is another issue—, but it can highlight some hypothesis for further research.


Some researchers from the Advanced Databases Group at Carlos III University of Madrid have carried out the mentioned study, designing hybrid models to capture the needed knowledge to identify adverse effects. The Natural Language Processing platform which supports the implementation of the analysis process based on such models is MeaningCloud. The customization capabilities provided by the platform have been decisive to include specific vocabulary and medical domain knowledge. As we know, the names of drugs and symptoms might be complex and, in some cases, difficult to write properly. The algorithm’s results are promising, with a 10% increase in recall when compared to other known algorithms. You can find further details in the scientific paper published by the BMC Medical Informatics and Decision Making Journal.

These developments have been part of the TrendMiner project, and are now available in the prototype website TrendMiner Health Analytics Dashboard, which shows people’s comments about antidepressants gathered from social media. The console displays the mentions of antidepressants and related symptoms and, by clicking on any of them, their evolution over time. Moreover, the source texts analyzed to compute those mentions are shown at the bottom, with labels highlighting the names of drugs, symptoms or diseases, and any relations among them. Such relations might say if a drug is indicated for a symptom or if a disease is an adverse effect of the mentioned drug. The prototype also allows searching by the ATC code (Anatomical Therapeutic Chemical Classification System) and the corresponding level according to this classification scheme. So, if you mark the ‘By Active Substance’ selector, you are searching any drug containing the active substance of the product you inserted in the search box. Furthermore, the predictive search functionality makes easier to find the right expression for a drug or disease. Please, have a look at the prototype and tell us what you think about it. If you find a chart useful, you can even tweet it from there! Any comment is more than welcome.

Adverse effects of medications and social media monitoring

Adverse Drug Reactions (ADR) are the biggest safety concern in the health field. Adverse Drug Reactions refer to harmful and unintended effects of drugs administered for the prevention and treatment of illness, both at normal dosages and in cases of incorrect usage or errors in medication. ADRs are the fourth cause of death for patients in hospitals in the U.S. Therefore, the pharmacovigilance area is receiving a great deal of attention at the moment, due to the high incidence of ADRs and the high associated costs (between 15 and 20 percent of hospital expenses are due to drug-related complications.)

There are certain adverse drug reactions which are not discovered during clinical trials because they do not become known until the drug has been on the market for several years. Therefore, medicine regulatory agencies have to monitor ADRs once the drug is on the market, and the main tool at their disposal is a system of voluntary notification whereby medical professionals and patients can report suspected ADRs (in Spain patients have been able to do so since July 2012). However, these systems are hardly used, and estimates indicate that only 5-20% of ADRs are reported, either due to lack of time, the complexity of the process, lack of knowledge of ADRs or poor coordination among healthcare staff.

As part of the European TrendMiner project, a prototype to analyze comments on social networks has been built that features MeaningCloud semantic analysis to recognize mentions of pharmaceutical drugs, adverse effects and illnesses. The system displays the development of these references and their “co-occurrences” i.e., it registers which drugs are mentioned and what the adverse effects are. For example, the system monitors anti-anxiety drugs and to do so it takes into account not only the references to the active ingredient or generic name of the drugs in this category (among others lorazepam and diazepam) but also commercial brand names (such as Orfidal). In addition, all of these drug references may also be analyzed in relation to their therapeutic effects (such as Orfidal being indicated for anxiety) and their adverse effects (such as Orfidal possibly causing shaking and tremors).

To read more about this project, developed with the Universidad Carlos III de Madrid go to the university’s website.

Voice of the Customer: banking industry

The Voice of the Customer (VoC) is a market research technique that produces a detailed set of customer wants and needs, organized into a hierarchical structure, and then prioritized in terms of relative importance and satisfaction with current alternatives.

Voice of the Customer (VoC)

The Voice of the Customer (VoC) is not a new concept. In one way or another, it’s been included in quality assurance processes for years, and yet, its full integration in the workflow is a pending tasks for many companies. The Voice of the Customer allows you to listen, interpret and react to what’s being said, and then monitor the impact your actions have over time.

The current challenge companies are facing comes from the volume of data available. In this digital age, feedback is ever-growing and not just limited to the periodic surveys sent to clients. Word-to-mouth has gone digital and has become more relevant than ever: everyone with a Twitter or a Facebook account has an opinion, and more often than not, it’s about the products and services they consume.

A typical client

A client

As so many other sectors, banking needs to figure out how to translate this first-hand source of knowledge their clients are providing into something useful, something that can be used in the company’s decision-making process.

Voice of the Customer combines two key aspects of information extraction: the need to know in detail what the customer is talking about and to interpret correctly his feelings about it. The former gives a quantitative view of the feedback obtained while the latter gives a more qualitative analysis, measuring what clients think a company is doing right or wrong.

The banking domain has the added difficulty of providing an extremely wide array of products and services, each one of them with very specific subcategories and received through completely different channels.

Continue reading

The Role of Text Mining in the Insurance Industry

What can insurance companies do to exploit all their unstructured information?

A typical big data scenario

Insurance companies collect huge volumes of text on a daily basis and through multiple channels (their agents, customer care centers, emails, social networks, web in general). The information collected includes policies, expert and health reports, claims and complaints, results of surveys, relevant interactions between customers and no-customers in social networks, etc. It is impossible to handle, classify, interpret or extract the essential information from all that material.

The Insurance Industry is among the ones that most can benefit from the application of technologies for the intelligent analysis of free text (known as Text Analytics, Text Mining or Natural Language Processing).

Insurance companies have to cope also with the challenge of combining the results of the analysis of these textual contents with structured data (stored in conventional databases) to improve decision-making. In this sense, industry analysts consider essential the use of multiple technologies based on Artificial Intelligence (intelligent systems), Machine Learning (data mining) and Natural Language Processing (both statistical and symbolic or semantic).

Most promising areas of text analytics in the Insurance Sector

Fraud detection

Detección de Fraude

According to Accenture, in a report released in 2013, it is estimated that in Europe insurance companies lose between 8,000 and 12,000 million euros per year due to fraudulent claims, with an increasing trend. Additionally, the industry estimates that between 5% and 10% of the compensations paid by the companies in the previous year were due to fraudulent reasons, which could not be detected due to the lack of predictive analytic tools.

According to the specialized publication “Health Data Management”, Medicare’s fraud prevention system in the United States, which is based on predictive algorithms that analyze patterns in the providers’ billing, in 2013 saved more than 200 million dollars in rejected payments.

Continue reading