False positives can be resolved through rules, training texts or by modifying the list of stopwords:
There are three possible actions we can carry out:
A very common scenario is to find some contexts that add ambiguity to the classification: "A" is always relevant for a category except when it's in the same context as "B". We could cover this case by saying that "A" is relevant while A
WITH B is irrelevant.
It is important to take into account that if mandatory terms are added to a category, all possible cases must be considered in order to make the list is as complete as possible.
This option can be applied to hybrid models and to rule-based ones.
The way to correct a false positive using training texts is simply to eliminate from the category the texts similar to the one that gives the false positive. This solution is not very frequent, as it is more complicated than editing the rules.
This option can be applied to statistical models and to rule-based ones.
If it's detected that a term is irrelevant for the whole model and that it adds noise to some categories, a good option would be to add said term to the list of stopwords so it's not taken into account in the classification.
This solution is not very frequent, but it is a good one in the initial phase of a model's optimization, as it's when it is easier to identify the terminology used in the domain, and thus to identify the terms that are common within the domain but do not help in the classification.
It's important to remember than modifying any category may change the relevance values the model assigns to the rest of the categories.