The most accurate and detailed categorization

Deep Categorization represents the evolution of classification technologies. This API assigns one or more categories of a predefined taxonomy to different snippets of a text. By applying a powerful semantic rule technology, it provides maximum accuracy in the classification while allowing the fastest and most efficient definition of models.

In addition, the API is able to explain the categories assigned to each snippet, identifying the expressions in it that have triggered the corresponding rules. Use it when you need maximum precision and detail in the categorization and to optimize implementation costs.

MeaningCloud Deep Categorization API

There is an increasing need to industrialize the extraction of deep insights from text. In scenarios ranging from media monitoring to contract analytics, it is no longer enough to perform a superficial analysis of content to extract generic categories and unconnected mentions. You have to dig into the detail of the meaning of these texts in a way that is scalable and affordable taking into account the immense volumes and variety of those contents.

MeaningCloud Deep Categorization API is a first step in that line, inaugurating a field that in MeaningCloud we have called Deep Semantic Analytics. This API is based on a taxonomy with predefined categories for performing a categorization of the text at the snippet level, instead of a classification at the complete document level. It attempts to discover blocks of text or passages that express "subcontexts" and reflect the structure of topics and subtopics within a document in order to understand it better.

It applies novel technologies that allow you to create categorization models with maximum simplicity and adapt them to the context, with a minimum investment and short implementation times.

Semantic rule technology

The Deep Categorization engine incorporates an innovative rule-based technology that not only covers the lexical and grammatical levels, but also reaches the semantic level. These rules are based on the deep morphosyntactic and semantic analysis that MeaningCloud makes of the text to define abstract patterns that refer to the functions and grammatical forms and the semantic categories of the terms. This semantic information can come from both the ontology of the product and personal dictionaries created by the user, which makes the rules very powerful and configurable.

In addition, the rules incorporate advanced expressions (regular, proximity, logical operators) and macros that multiply their expressiveness. The advantage is that instead of basing the rules on literal forms of the terms we can base them on the function and meaning of the expressions. In this way, an optimal trade-off between simplicity (abstraction) and power in the creation of categorization models can be achieved.

Applications

The scenarios that can benefit from a detailed and precise categorization are innumerable, among them:

Content categorization

Classifies with great detail and precision all types of content (e.g., web pages, news) to be able to retrieve them, navigate them, and relate them better or insert targeted advertising.

Deep understanding of documents

Discover the topic and subtopic structure of a complex document (contract, financial report) to have a map of its meaning and to be able to focus subsequent analyses (e.g., extraction of terms in contract clauses).

Analysis of the Voice of the Customer

analyze the unstructured feedback generated by customers in surveys, contact centers, or social media to discover their needs, perceptions, and preferences. Available as a MeaningCloud Vertical Pack.

Analysis of the Voice of the Employee

Manage talent more efficiently in your organization by discovering the opinions, desires, and skills that your employees express in performance assessments, surveys, or interviews. Available as a MeaningCloud Vertical Pack.

Advantages of the Deep Categorization API

Deep Categorization offers great advantages over other more traditional classification technologies:

Granular

It categorizes at the snippet level, discovering the thematic structure of a document, for example, to identify the clauses and passages of a contract that have to do with a certain topic.

Accurate

Based on semantic rules, you can increase accuracy and recall to levels that are unattainable with other technologies, simply by adding focused rules.

Dynamic categories

Personal dictionaries can be used to define entities and concepts in certain semantic classes (e.g., brands) that are dynamically used in the rules that define the categories.

Self-explanatory

It justifies the assignment of the categories to the snippets, showing the expressions that have triggered the corresponding rules, and provides confidence metrics.

For short texts and complex documents

The rule-based technology provides good results both in texts with few words and in documents with multiple sections.

Predefined models

Use the API immediately taking advantage of its pre-prepared categorization models, such as those for the analysis of the Voice of the Customer or the Voice of the Employee included in MeaningCloud Vertical Packs.

Customizable

Use our tools for rule creation and validation to easily build models that are fully adapted to your scenario.

Does not require exhaustive training corpus

Unlike other technologies, semantic rules allow the development of models without the need for an extensive training set, with no more than an abstract understanding of the categories.

Multilingual

Currently available in 6 languages ​​and soon in many more.

When to use Deep Categorization instead of Text Classification?

The Text Classification API offers a perfectly viable alternative when it comes to categorizing contents according to a predefined taxonomy, especially when you have training texts that serve to feed its machine learning technology. However, there are scenarios for which we recommend using Deep Categorization:

  • You want categorization at the snippet level, not just at the document level.
  • You need an explanation of the assignment of each category.
  • You don’t have sufficient training texts.
  • You desire an optimal tradeoff between accuracy and cost.