In both these new versions, the main focus is on user-defined models. We know how important it is to easily define the exact criteria you need, so the new classification API supports a new type of resource, the one generated with the Classification Models Customization console 2.0.
With these new versions, we’ve aimed to:
- Make criteria definition easier: more user-friendly operators to improve overall rule readability, and new operators to provide more flexibility.
- Remove dependencies between categories in a model that made their maintenance and evolution cumbersome.
- Give the user more control over where the relevance assigned to the categories comes from.
Let’s see with a little more detail what’s new.
These are the main highlights for the new Classification Models Customization console:
- New syntax for the term definition. The syntax is much more readable and easier to understand. We have also added new operators to increase the flexibility in the definition. One of these operators is
NEAR, a proximity operator that allows you to define contexts in a much more limited way.
- There’s a less restrictive relationship between the statistical and rule-based classification, as well as more control over the weight each part adds to the final relevance.
- The stopwords defined for the model now only affect the statistical classification.
- There’s more control and traceability over the relevance in the output:
- There are new parameters that allow you to configure different aspects of what adds weight to the final results.
- The logic behind the weight derived from the different operators/rules is much clearer.
- No multiword dependency in the tokenization process. In the previous version, once you defined a multiword, it affected how every text was tokenized, generating dependencies between categories and making the debugging process cumbersome. We’ve completely removed this dependency: multiwords can still be defined, but they do not affect the tokenization.
- New lemmatization functionality. The rules in the models can now be defined using lemmas instead of having to include every single variant of the word in questions.
- Explicit hierarchy can be assigned in categories instead of having to do it implicitly in fields such as the code or the label.
- The build operation has been greatly improved, reducing the time it takes to build large models.
- Batch update of categories is now enabled through the import process.
In turn, these are the changes in the new Text Classification API 2.0:
- Explicit hierarchy can be obtained in the classification for the models where it’s defined.
- Now the
abstractof a text can be added as a parameter. The weight derived from terms in it will be three times higher than for those terms found in the text.
- Improvements have been added to the service that provides the analysis with a double focus: maintaining the performance of the predefined models MeaningCloud provides, and improving the performance of large user-defined models.
- Compatibility with models created with Classification Customization Console 2.0.
- The IAB model has been retired from Text Classification. Now it’s served with Deep Categorization.
This will give time to every user to migrate their models and integrations to the new versions. You can find detailed information on how to carry out your migration in the migration guide.
Regarding the limit of models provided for your plan, it will apply to each one of the consoles separately, enabling you to maintain both versions while you test them. In other words, if you have two models in your plan, you will be able to have two models in each console while both consoles exist.
In the upcoming weeks, we will be updating all the integrations that use Text Classification 1.1 to Text Classification 2.0.
If you have any questions, issues or just want to say hi, we are always available at firstname.lastname@example.org!