Text Classification in Excel

The Text Classification analysis integrates the functionality provided by the Text Classification API, that is, it allows to assign one or several categories to any text according to the model selected. The model used to classify the input text may be either one of the models included in the API or one of the models defined by the user.

For every language there's a default model that follows the International Press Telecommunications Council standard (IPTC) for news, which classifies texts in around 1400 categories.

This is the interface that will appear when you click the Text Classification button:

    Text Classification user interface

As you can see there are two sections in the interface: Input, which we have already covered in the corresponding section, and Analysis Settings.

In Analysis Settings there are two values to select:

  • Language, to select the language of the texts. By default, the language used in the last analysis will be preselected; in case it hasn't been set, the first element of the list will be the selected one.
  • Model, with the model that will be used to classify the texts. The models listed are determined by the language selected in the previous menu. In this list are also included any user-defined models associated to the key that's being used: they will be identified with a lock icon before the name, as follows:

      Text Classification user model

The following table contains the possible languages supported by the Text Classification functionality and which models are available for each one of them:

    ES EN FR IT PT CA DA SV NO FI
    IPTC
    IAB
    Social Media
    Eurovoc
    Business Reputation
    User-defined model lock icon[User-defined Models]

Advanced Settings

We've seen in the Settings section that there's an advanced settings menu with additional configuration options for the Text Classification analysis. These are the options available and their default value:

    Text Classification advanced settings

There are two main aspects to configure: the number of categories you want to see in the results, and which fields you want to output for each category.

  • Number of categories shown: set by default to 1 and with a maximum value of 10.
  • Fields:
    • Code: shows the code associated to the category.
    • Label: shows the label of the category; it's non-configurable, so it always appears in the results.
    • Rank: shows the rank or order in which a category has been associated to a text.
    • Relevance: shows the absolute relevance associated to the category.
    • Relative Relevance: shows the relative relevance associated to the category.

There's more information about each one of these fields in the response section of the API documentation.

Output

The results obtained from the classification will be shown in a new Excel sheet called "Text Classification". This sheet will include a column with the source text, a column with the IDs if enabled, and then a column for each of the output fields configured in the advanced settings.

When the analysis is configured to output more than one category, each additional category associated to a text will be inserted as a new row, allowing a more flexible use of the results.

This is an example of a possible output of texts in English classified using the IPTC model and without using IDs. The configuration is set to show the output fields configured by default and up to 3 categories: