Language Identification in Excel

This analysis integrates the functionality provided by the Language Identification API. It allows to identify the language of a text easily and without any development. You can see all the detectable languages and how they are detected in this table.

This is the interface that will appear when you click the Language Identification button:

    Language Identification user interface

As you can see there are two sections in the interface: Input, which we have already covered in the corresponding section, and Analysis Settings.

In Analysis Settings there is a single value to set:

  • Threshold, to select the minimum relevance value the languages in the output will need to have to appear in the results. Allowed values are between 0 and 100, both included.

Advanced Settings

We've seen in the Settings section that there's an advanced settings menu with additional configuration options for the Language Identification. These are the options for Language Identification and their default values:

    Language Identification advanced settings

There are two differentiated sections: Input Configuration, with the options with which you will call Language Identification, and Output Configuration, to configure the output.

  • Input configuration, contains the selection of languages you can limit the result to, a white list for the values that will appear in the results. This white list will contain the values of the languages as they would appear in the field Language, and they will be separated by commas. For instance, if we want to limit the results to English and Spanish, we should set "es, en".
  • Output configuration, contains the fields which you can obtain in the output as well as the number of languages you want to see in the results
    • Number of languages to show: set by default to 1 and with a maximum value of 5.
    • Fields shown:
      • Language: displays the ISO639-1 symbol of the language found.
      • Name: name of the language.
      • Relevance: relevance value assigned to it, how probable it is that the text sent is in the language detected.
      • Rank: shows the order in which it was found.

Output

The results obtained from the analysis will be shown in a new Excel sheet called "Language Identification". This sheet will include a column with the source text, a column with the IDs if enabled, and then a column for each of the output fields configured in the advanced settings.

When the analysis is configured to output more than one language, each additional language associated to a text will be inserted as a new row, allowing a more flexible use of the results.

This is an example of a possible output of texts in different languages. IDs are enabled, cells at the output are combined and the configuration is set to show all the possible output fields and up to 3 languages.