Topics Extraction in Excel

The Topics Extraction analysis integrates the functionality provided by the Topics Extraction API. It allows to extract different kinds of topics from a text easily and without any development.

This is the interface that will appear when you click the Topics Extraction button:

    Topics Extraction user interface

You can see that there are two areas in the interface: Input, which we have already covered in the corresponding section, and Analysis Settings.

In Analysis Settings there are two elements to configure:

  • Language, to select the language of the texts. By default, the language used in the last analysis will be preselected; in case it hasn't been set, the first element of the list will be the selected one.
  • User dictionary, to select if you want to use one of your user dictionaries in the analysis. The dictionaries that will appear in this menu are the ones created through the dictionaries customization console with the user with the key configured in the addin.
  • Topics Extraction user dictionary

Advanced Settings

We've seen in the Settings section that the Advanced Settings menu contains specific options on what we want to output in each one of the analyses available and in some cases options regarding how the analysis is going to be done. These are the options for Topics Extraction and their default values:

    Topics Extraction advanced settings

In this case, there are two differentiated sections: Input Configuration, with the options with which you will call Topics Extraction, and Output Configuration, to configure the output.

  • Input Configuration: contains which types of topics you will extract from the text. There are six different types:
    • Entities: named entities
    • Concepts: concepts
    • Time Expressions: dates and times
    • Money Expressions: amounts of money
    • Quantity Expressions: quantities
    • Other Expressions: alphanumeric patterns
    At least one type of topic must be selected to save successfully the configuration.
  • Output Configuration: contains the fields which you can obtain in the output:
    • Form: displays the name by which the topic extracted is identified. It's not configurable, so it always appears in the results.
    • Topic category: shows the type of topic extracted.
    • Rank: contains the order in which the topics have been detected. It's specific for each type of topic, that is, the first entity detected will be ranked 1, and the first concept will be ranked 1 too, and so on.
    • Type: shows the type associated to the topic according to our ontology.
    • Theme: theme of the topic according to the ones described in our ontology.
    • Frequency: number of times the topic appears in the text.
    • Mentions: mentions of the topic in the text separated by commas.
    • Sense ID: id of the the topic (sense ID in the user dictionaries).
    deep-categorization

Output

The results obtained from the analysis will be shown in a new Excel sheet called "Topics Extraction". This sheet will include a column with the source text, a column with the IDs if enabled, and then a column for each of the output fields configured in the advanced settings.

This is an example of a possible output. It's configured to extract all the possible topic types, from several texts in English and without using IDs. All the fields available in the output are shown: