Language Detection

Language Detection identifies the dominant language of a text. The detection is based in the franc library and uses N-grams. This means that the longer the text, the better the detection will be.

These are the settings for this recipe:

  • Input parameters:
    • Text column: the column names available in the dataset used as input source will be loaded, so you can select the one with the texts to analyze.
  • Configuration:
    • API configuration preset: license key and server to use in the API requests. It can be set using one of the presets defined in the plugin Settings or they can be manually defined.

The output dataset will have two new columns called "language_code", with the language code according to the ISO639-1 (i.e. "en"), and "language_name", with the name of the language (i.e. "English"). The following example uses the dataset used in this tutorial. The recipe is configured to obtain the language code.