One of the APIs that has had more “movement” lately in our updates is the Deep Categorization API, which — as many of you already know — provides an easier, more flexible and precise way to categorize texts. Most of this movement has come in the form of new supported models such as Intention Analysis, as well as many under-the-hood improvements.
We are happy to announce that we have finally released the Deep Categorization customization console in our web.
This console will allow you to create accurate models for those scenarios where you need a very high level of linguistic precision to differentiate between the different categories you want to detect.
A customization tool for complex categorization problems
So, what do we mean when we talk about scenarios that require a high level of linguistic precision? Everyone that works on NLP tasks learns soon enough that languages are as powerful as subtle and ambiguous. The ways in which people express themselves change continuously and vary enormously from one context to another.
Take the following cases:
- The same verb may indicate different stages of the customer journey depending on features such as its tense (“I will buy that”, “I’m buying that”) or its context, including words such as negation or frequency adverbs (“I’m never buying that”), which in machine learning solutions are often considered stopwords.
- Polysemic words may act as different parts of speech with each meaning, giving a clear linguistic sign of when we are in one context or another (“she shares her toys” vs “she toys with her shares”).
- In some languages, a single lemma translates into dozens of forms that need to be taken into account.
Characteristics such as these are hard to take into account with statistical approaches or with a simple rule language. We’ve run into this problem in the past when working on tasks related to scenarios such as VoE or VoC, where the text came directly from users and there was quite a lot of variety in the ways answers could be expressed (and a fair amount of similarity between some of the categories).
In Deep Categorization, we combine the morphosyntactic analysis we obtain from our core engines (which includes sentiment analysis as well as resource customization) with a flexible rule language that’s both powerful and easy to understand and that enables us to approach cases as complex as the ones we have mentioned above.
A powerful semantic rule language
Deep Categorization models can access the following features:
- Morphological aspects: defining a word as a lexical form or a lemma, specifying which part of speech it should be or if a word is under the influence of a negation. For instance, this allows us to differentiate between share as a verb (“she shares her toys”) from share as a noun (“she toys with her shares”).
She shares her toys
She toys with her shares
She shares her toys
She toys with her shares
- Semantic aspects: all the semantic information in MeaningCloud’s resources and any ontology type you want to define in user dictionaries.
She is going to buy a car
She’s buying strawberries
- Logical operators: the classic AND, OR and NOT operators that let you combine different features to create complex rules.
- Context operators: ways to truncate the context such as number of words, or excluding specific instances from a larger list.
Deers are easily spooked
Cats are very independent
- Distance operators: they enable us to define both strict and lenient contexts according to the distance between words.
[give “what I requested”
They gave me what I requested promptly
They gave me an explanation but what I requested was still missing
- Regular expressions: some of the flexibility regular expressions provide is also allowed.
Deep Categorization models enable you to create easily an accurate categorization for complex scenarios. The flexibility provided by the rule syntax translates into not needing as many rules as you would with a simpler rule language. And, unlike machine/deep learning, it’s very easy to fine-tune the model and you don’t need any corpus to get started.
See it in action in our upcoming webinar
The previous examples should give you a fair idea of how powerful the syntax is and tease what you can do with it. We will see all of this in more detail in the upcoming webinars on June 18th (Spanish edition) and June 19th (English edition).
For any questions, we are available at firstname.lastname@example.org.