Adapt the functionality of our APIs to your scenario to achieve maximum accuracy in the analysis
MeaningCloud features a set of customization tools allowing users to adapt the functionality of the APIs to their scenario in an easy way and without programming. They enable to create domain-specific dictionaries and models to provide optimal precision and recall while performing information extraction, classification or sentiment analysis.
Quality in text analytics: precision and recall
Text analytics APIs are not perfect. The analysis of text carried out by human experts isn’t perfect either, given that - due to the ambiguity of language - the percentage of coincidence between human annotators does not exceed 85-95%.
The analysis' quality or accuracy is usually assessed in terms of precision (the amount of detected elements that are relevant) and recall (the amount of relevant elements that are detected). In general, given a certain analysis technology, precision and recall are antagonistic: improvements in one may worsen the other and vice versa. For this reason, the key is to find a trade-off between the two that is optimal for the application.
For example, in an application for brand reputation monitoring in social media, high precision may be a priority, even if recall is low (the analysis is directionally correct, although some comments might go unnoticed). On the contrary, a counter-terrorism application may require high recall (nothing is lost), despite low precision (false alarms that are monitored manually).
Why customize APIs?
The quality of a text analytics system depends on both the technologies and algorithms employed and the linguistic resources (ontologies, models) incorporated. For example, if a certain entity is not included in the resources used for topics extraction, it is hardly detected. Or, if you want to identify which department of a company is being mentioned, the text classification model should feature particular categories representing each department.
And, of course, it is impossible for a standard product to include in its resources all the topics, themes, etc. of any possible application. Incorporating the necessary linguistic resources in each case permits to achieve an optimal quality score in the analysis. And this is the purpose of MeaningCloud’s customization tools.
Imagine a user who needs to analyze the customer feedback related to a financial services company. Perhaps an ontology or some general purpose dictionaries do not cover this sector with enough depth and scope (in terms of products, people, themes, etc.), so relevant mentions, subjects or opinions may go unnoticed.
It will be necessary to complement those general resources with:
- The names of the most relevant companies, products, executives, etc. of such industry to monitor them accurately.
- Taxonomies on product categories (deposits, mortgages, accounts, etc.) or interaction channels (office, phone, the web), to be able to classify the conversations.
- Positive/negative/neutral polarity of the various terms in different uses and contexts, e.g. the expression "the interest rate is very high" may be positive if it refers to deposits but negative if deals with mortgages.
MeaningCloud's customization tools
To tackle this type of scenarios, MeaningCloud has a full repertoire of customization tools enabling to adapt the functionality of the different APIs to the domain of the user.
Create new entities and concepts, connected in an ontology, to be able to detect theis appearances in a text.
Create new taxonomies and train/configure classification engines that categorize texts according to them.
Define the polarity of words (or groups of words) when they appear in different contexts and play different roles to adapt the sentiment analysis to your domain.
Deep categorization models
Create new taxonomies and configure engines that perform a high granularity and accuracy categorization of texts, based on semantic analysis.
These customization capabilities are based on MeaningCloud's powerful Natural Language Processing technology.
The dictionariy management tool permits to create new entities and concepts assigning them semantic information and connecting them in an ontology. Once the dictionary has been created, MeaningCloud APIs like Topics Extraction, Lemmatization, PoS and Parsing, and Sentiment Analysis can recognize these elements in a text and extract them, returning the related semantic information.
The classification model management tool lets you create taxonomies (composed of hierarchies of categories) and build classification models for them. In this way, the Text Classification API can categorize texts according to such categories. In the definition of a category and the creation of a model that classifies it, MeaningCloud applies two technologies:
- One based on training texts and machine learning, by which a set of sample texts for each category is provided to the tool so that the system can automatically generate patterns for the classification.
- Another one based on the configuration of rules, which specify terms that must appear in a text, terms that should not appear, terms that increase the text's relevance with respect to a category and terms that reduce such relevance.
This combination of technologies (training and rules) enables to combine the advantages of a fast implementation of the statistical approach with the high precision of rules.
The sentiment model management tool enables to define the polarity (positive, negative, neutral...) of words in a specific application scenario. Unlike other technologies available on the market that essentially define "bags of words" with either positive or negative polarity, this MeaningCloud tool goes far beyond and makes it possible to:
- Define the role of a word as a polarity vector (container, negator, modifier), allowing to use lemmas to easily incorporate the possible variants of each word
- Specify particular cases of a word's polarity, depending on the context in which it appears or its syntactic function in each case
- Define multiword expressions as priority elements in the evaluation of polarity
- Manage how these personal polarity models complement or replace the general models of each language.
The sentiment models defined with the tool become available so that the Sentiment Analysis API can assess the polarity according to them.
Deep categorization models
The deep categorization model management tool allows to create taxonomies (composed of hierarchies of categories) and build models that categorize text with high granularity and accuracy, using its morphosyntactic and semantic analysis. To that end the tool incorporates a semantic rule language enabling to define powerful, high-level expressions that leverage the detailed tagging of words, along with operators of diverse types. And all this without requiring large training sets, unlike other technologies. The result is models that are very powerful, accurate and easy to refine, and that are used by the Deep Categorization API.
The main benefit that these tools give users is the autonomy to develop their custom text analytics system. Other providers require the involvement of their professional services (usually expensive) to carry out a basic adaptation of their APIs.
On the contrary, MeaningCloud's tools give users the autonomy to develop - in an easy way and without programming - powerful analysis engines tailored to their needs, guaranteeing the highest quality.