Do you have any questions? Just write us an email or ask us through the feedback section.

Request

Requests are made using GET or POST data submissions to the API entry point. Typically, a POST method is recommended in order to overcome the parameter maximum length limit associated to the GET method.

Request

Endpoint

This is the endpoint to access the API.

Service Method Url
Text Clustering POST https://api.meaningcloud.com/clustering-1.1 Console

If you are working with an on-premises installation, you will need to substitute api.meaningcloud.com by your own server address.

Parameters

These are the supported parameters.

Name Description Values Default
key The access key is required for making requests to any of our web services. You can get a valid access key for free just by creating an account at MeaningCloud. Required
of Output format. xml
json
Optional. Default: of=json
lang It specifies the language in which the text is. en: English
es: Spanish
it: Italian
fr: French
pt: Portuguese
ca: Catalan
da: Danish
sv: Swedish
no: Norwegian
fi: Finnish
Required
txt This parameter will contain one or more texts, one per line. All the texts sent in this parameter will be assigned automatically the ID used to identify them at the output. The IDs will be numerical, and will start from 1. For mode=dg, more than one text needs to be sent. UTF-8 encoded text (plain text, HTML or XML). Required
id This parameter will contain the IDs associated to the input texts. Each ID will have to be included in a different line, and the number of IDs included has to be the same as the number of texts included in txt. UTF-8 encoded text (plain text, HTML or XML). Optional. Default: id=""
mode This parameter will define the approach used to carry out the clustering process. To read more about the possibilities check the Clustering modes section. tm: Topic Modeling (default)
dg: Document Grouping
Optional. Default: mode="tm"
sw Stopwords to be ignored by the algorithm, both in the clustering process, and as labels for the clusters. The valid format is a stopword per line (separated by linefeed "\n"). These stopwords are added to the ones used by default for the selected lang. UTF-8 encoded. Optional. Default: sw=""

Clustering modes

The current clustering modes available are the following:

  • Topic modeling: this method groups the documents passed in the txt parameter by the n-gram that's most representative of its meaning. It's a change in the pipeline found in classical clustering algorithms, as it selects the representing labels before grouping the texts. This approach helps to discover hidden themes in document collections providing more descriptive labels than classical clustering algorithms. Cluster assignation is not exclusive (a text can belong to more than one cluster), and there will always exist a default cluster called Other Topics with the texts that do not belong to any other cluster.
  • Document grouping: this method implements the classic bisecting k-means algorithm. One of its most significant differences with topic modeling is the fact that cluster assignation is exclusive, that is, a text can only be assigned to a single cluster. In this case, labels are not as descriptive; they are composed by a collection of terms that describe the documents assigned to the cluster. For large collections, the label will be a single term.

So, which one to choose? It will depend on your use case, but the main factors to take into account are thattopic modeling gives more descriptive labels and more weight to outliers in the collection, while document grouping is the only one that provides exclusive clustering.