What is Text Classification?

Text Classification assigns one or more classes to a document according to their content. Classes are selected from a previously established taxonomy (a hierarchy of catergories or classes). The Text Classification API takes care of all preprocessing tasks (extracting text, tokenization, stopword removal and lemmatization) required for automatic classification.

This API supports a variety of text classification scenarios like:

  • Binary classification like spam filtering (HAM, SPAM) or simple sentiment analysis (POSITIVE, NEGATIVE)
  • Multiple class classification like selecting one category among several alternatives - movie genre classification (thriller, terror, romantic, etc ...)
  • Multilabel categorization - assigning all categories that apply to a single document
  • Complex taxonomy categorization - assigning categories arranged in a multilevel taxonomy

The algorithm combines statistical document classification with rule-based filtering, which allows to obtain a high degree of precision in a wide range of environments.Statistical classifiers provide a means to use example documents to define each category. In turn, rule base classifiers may help to fine-tune the classification and correct the output of statistical classifiers. Our powerful rule based classification language is also useful to bootstrap a categorization when no examples are available.

MeaningCloud provides a number of ready-to-use models that classifies documents into standard taxonomies but you can also use your own taxonomy and train your model.