Below you will find the most relevant changes done to the classification models customization console.

2.0 (10/Jun/2020)

These are the main highlights for the new console:

  • New syntax for the term definition. The syntax is much more readable and easier to understand. We have also added new operators to increase the flexibility in the definition. One of these operators is NEAR, a proximity operator that allows to define contexts in a much more limited way.
  • There's a less restrictive relationship between the statistical and rule-based classification as well as more control over the weight each part adds to the final relevance.
    • The stopwords defined for the model now only affect the statistical classification.
  • There's more control and traceability over the relevance in the output:
    • There are new parameters that allow you to configure different aspects of what adds weight to the final results.
    • The logic behind the weight derived from the different operators/rules is much clearer.
  • No multiword dependency in the tokenization process. In the previous version, once you defined a multiword, it affected how every text was tokenized, generating dependencies between categories and making the debugging process cumbersome. We've completely removed this dependency: multiwords can still be defined, but they do not affect the tokenization.
  • New lemmatization functionality. The rules in the models can now be defined using lemmas instead of having to include every single variant of the word in questions.
  • Explicit hierarchy can be assigned in categories instead of having to do it implicitly in fields such as the code or the label.
  • The build operation has been greatly improved, reducing the time it takes to build large models.
  • Batch update of categories is now enabled through the import process.

You can check the migration guide we've published to read more about the process of transforming a model from the customization console 1.0.