Language Identification in Google Sheets

This analysis integrates the functionality provided by the Language Identification API. It allows to identify the language of a text easily and without any development. You can see all the detectable languages and how they are detected in this table.

On the right, you can see the sidebar that appears when you click on Language Identification.

There are two sections in the interface: Select cells with texts to analyze, which we have already covered in the corresponding section, and Analysis settings.

In Analysis settings there is only one configurable value:

  • Threshold, to select the minimum relevance value the languages in the output will need to have in order to appear in the results. Allowed values are between 0 and 100, both included.
Language Identification user interface

Advanced settings

The Advanced settings menu contains additional options for Language Identification. There are two differentiated sections: Input configuration, with the options you will use to call Language Identification, and Output configuration, to configure the output.

  • Input configuration contains the selection of languages you can limit the result to, a whitelist of values that will appear in the results. The possible values of this whitelist must be expressed in the same way as they would appear in the field Language, separated by "|". For instance, if we want to limit the results to English and Spanish, we should set "es|en".
  • Output configuration allows to select which fields you want to obtain in the output as well as the number of languages.
    • Number of languages shown: set by default to 1 and with a maximum value of 5.
    • Fields shown in the output:
      • Language: displays the ISO639-1 code of the language found.
      • Name: name of the language.
      • Rank: shows the order in which it was found.
      • Relevance: relevance value assigned to it, how probable it is that the text sent is in the language detected.
Language Identification advanced settings

Output

The results obtained from the analysis will be shown in a new spreadsheet called "Language Identification". This sheet will include a column with the source text, a column with the IDs if specified, and a column for each one of the output fields configured in the advanced settings.

When the analysis is configured to output more than one language, each additional language associated to a text will be inserted as a new row, enabling a more flexible use of the results.

This is an example of a possible output of texts in different languages. IDs are enabled, cells at the output are combined and the configuration is set to show all the available output fields and up to 3 languages: