Term rules (rule-based models)

After the main attributes there are four more fields where you can define the classification rules:

Category terms
  • Mandatory terms: list of mandatory terms.

    The terms included in this list are the ones that have to appear in a text for it to be classified in a category. In other words, at least one term of this list must appear in the text for it to be classified in the category. This means that when we define mandatory terms, the list must be thorough as a text will not be classified into the category unless one of the mandatory terms defined appear in it.

  • Excluding terms: list of excluding terms.

    If any of the terms included in this list appears in a text, the category for which they have been defined will be automatically excluded from the classification results. These terms must be as unambiguous as possible in order not to exclude incorrectly texts that do not have anything to do with the category.

  • Relevant terms: list of relevant terms.

    The appearance in a text of any of the terms defined in this list will increase the relevance weight assigned to it. The fact that a term appears does not necessarily imply that the text will be classified into its category, as the final result will depend on how many categories it is classified into and the relevance thresholds of the model.

  • Irrelevant terms: list of irrelevant terms.

    The appearance in a text of any of the terms defined in this list will decrease the relevance weight assigned to it. It will never exclude the category from the classification results, it will just modify its relevance value.

If a term starts with three or more hash symbols, it will be deactivated. When deactivated, the rule will not be used in the classification. This is very helpful to debug models without having to remove rules. By commenting a rule in this way, you can easily test its effect in the overall classification.

Did you notice... ?

If a rule is commented/deactivated with '###', it will not be used in the classification, but it will still be counted as a rule of the model.

If an active term is included in any of these lists for any of the categories of a model, the model will not be a statistical one, but rather a rule-based (if there are no training texts) or a hybrid model.

As it will be explained in the next section, there are a number of operators available for the term definition. You will be able to define simple terms or multiwords, and in both cases, to assign them specific contexts.