Text Clustering in Google Sheets

Text Clustering integrates the functionality provided by the Text Clustering API. It performs automatic clustering of documents in order to group them by similarity and discover significant subjects.

On the right, you can see the sidebar that appears when you click on Text Clustering.

There are two sections in the interface: Select cells with texts to analyze, which we have already covered in the corresponding section, and Analysis settings.

In Analysis settings you can configure three elements:

  • Language, to select the language of the texts.
  • Mode, to select the mode to use in the clustering process.
  • Stopwords, to add terms that are not to be taken into account in the clustering process. Each stopword must be introduced in a separated line, like in the following example: Text Clustering - analysis settings
Text Clustering user interface

Important

To be able to use any of our language packs, you need to have access to them! You can request access in the developer home. You can read more about it here.

Advanced settings

The Advanced settings menu contains additional options for Text clustering. There is only one section: Output configuration, to configure the output of the analysis.

In this section, you will be able to select which fields to show in the output:

  • Cluster: displays the title assigned to the cluster. This field is not configurable, so it's always shown.
  • Size: shows the size of the cluster.
  • Rank: shows the order by relevance of the cluster for the document.
  • Score: shows the relevance value assigned to the cluster.

You can read more information about these fields in the response section of the API documentation.

Text Clustering advanced settings

Output

The results obtained from the analysis will be shown in a new spreadsheet called "Text Clustering". This sheet will include a column with the source text, a column with the IDs if enabled, and a column for each one of the output fields selected in the advanced settings.

When the document is included in more than one cluster, each additional cluster will be inserted as a new row.

This is an example of a possible output of a number of texts. We are not using IDs and the configuration is set to show all the possible output fields: