In this view you will be able to modify the settings of a model. There are two sections: the first one is related to the model’s general settings, and the second one enables to modify the classification settings.
If the minimum absolute relevance is set to 0 and you are working with a statistical model, it's quite possible that you will get all the categories in the results, as if the text you want to classify is long enough, it will have some similarity (albeit a very low one) to the training texts in the categories.
Valid range is between 0 and 1, and by default it's set to 0. See an example of the filtering in the table below, according to different values of the parameter:
Category | Absolute relevance | Relative relevance | Threshold = 0.3 | Threshold = 0.51 |
---|---|---|---|---|
Category 1 | 0.8 | = 0.8/0.8 = 1 100% | ||
Category 2 | 0.4 | = 0.4/0.8 = 0.5 50% | ||
Category 3 | 0.2 | = 0.2/0.8 = 0.25 25% |
NEAR
and AND
. By default, it's disabled.title
field of the Text Classification more importance with respect to appearances in the text. The default value is 5, in other words, terms in the title count five times as much as terms in the text.abstract
field of the Text Classification more importance with respect to appearances in the text. The default value is 3, in other words, terms in the abstract count three times as much as terms in the text.Stopwords: list of words that do not provide any useful information to decide in which category a text should be classified. This may be either because they don't have any meaning (prepositions, conjunctions, etc.) or because they are too frequent in the classification context.
In the model creation step we saw that it is possible to associate the model with a language. This means, in practical terms, that when you create a model, a default list of stopwords for the chosen language is added. This list includes prepositions, conjunctions and the most common verbs.
In the image at the top of the page, we can see the list of stopwords you would obtain if the chosen language were English. The list of stopwords only affects the statistical classification.
The list is editable, so you will be able to add or remove any item. Each stopword must be written in a different line, and to save any changes you will have to click the "Save" button.
It not unusual to find that for some scenarios, words that would normally be used to classify need to be added as stopwords. For example, when analyzing a company's customers feedback, the company name may not be relevant for the classification.
To add a new stopword, just have in mind the following guidelines:
Let's see some examples:
Stopword | Is it correct? | Why isn't it correct? |
---|---|---|
agent | ||
capitán (ES) | Accent marks are not allowed | |
señor (ES) | ||
captain america | Blank spaces are not allowed | |
u.k | ||
u. k | Blank spaces are not allowed | |
Dashes are not allowed. |
These limitations in the stopwords list come from the filtering process the system carries out before classifying a text. You can get more info about it in the tokenization section.