Stopwords are those words that do not provide any useful information to decide in which category a text should be classified. This may be either because they don't have any meaning (prepositions, conjunctions, etc.) or because they are too frequent in the classification context.
In the model creation step we saw that it is possible to associate the model with a language. This means, in practical terms, that when you create a model, a default list of stopwords for the chosen language is added. This list includes prepositions, conjunctions and the most common verbs.
The following image shows the list of stopwords you would obtain if the chosen language were English:
The list is editable, so you will be able to add or remove any item. Each stopword must be written in a different line, and to save any changes you will have to click the "Save" button.
It not unusual to find that for some scenarios, words that would normally be used to classify need to be added as stopwords. For example, when analyzing a company's customers feedback, the company name may not be relevant for the classification.
To add a new stopword, just have in mind the following guidelines:
This is the message you will see if any of these guidelines is not satisfied:
|Stopword||Is it correct?||Why isn't it correct?||Result|
|capitán (ES)||Accent marks are not allowed||capitan|
|captain america||Blank spaces are not allowed|
|u. k||Blank spaces are not allowed|
|Dashes are not allowed.|
These limitations in the stopwords list come from the filtering process the system carries out before classifying a text. You can get more info about it in the tokenization section.