Text Classification 2.0: Migration Guide

We’ve recently published a new version of our Text Classification API, which comes hand in hand with a new version of the Classification Models Customization console.

In both these new versions, the main focus is on user models. We know how important it is to easily define the exact criteria you need, so the new classification API supports a new type of resource, the one generated by the Classification Model Customization Console 2.0.

In this post, we will talk about how to migrate to these new versions if you are currently using the old ones. Text Classification 1.1 and Classification Models 1.0 will be retired on 15/Sep/2020.

There are three possible migration scenarios:

  1. You are using any of the predefined models except for IAB.
  2. You are using the IAB model.
  3. You are using a user-defined model.

Migrating the API: from Text Classification 1.1 to Text Classification 2.0

Migrating the integration with an API has two parts: adapting the request and the response the API returns. In both cases, the changes are minimal, so the migration should not be very costly.

The following table contains the most relevant changes in the request:

Text Classification 1.1 Text Classification 2.0
Endpoint https://api.meaningcloud.com/class-1.1 https://api.meaningcloud.com/class-2.0
Parameter debug Did not exist. When enabled, it shows additional debug information about the rules in the model that have been triggered. It only applies to user-defined models.
Parameter abstract Did not exist. Descriptive abstract of the content. The terms relevant for the classification process found in the abstract will have more influence in the classification than if they were in the text (but less than the ones in the title).
Parameter categories Renamed to categories_filter, same behavior.
Parameter expand_hiearchy Did not exist. It allows you to select if in the results you want to include the parents or the ancestors of the category/categories in which the content has been classified. It only applies to models with explicit hierarchy. By default it shows no ancestors, which is the same behavior as in version 1.1.

All the other parameters from Text Classification 1.1 not explicitly mentioned, behave exactly the same in Text Classification 2.0.

The response does not change much either. The main two changes come from the new parameters enabled:

  • When debug is enabled, a new element will appear at the output, debug, with a list of rules triggered in the text classification with two values: the rule and the weight they add.
  • Each term included in term_list, now also contains a field called abs_frequency with the frequency of the term in the text classified.

Easy peasy! You can read all the documentation for the new API here.

Migrating the API when using IAB: from Text Classification 1.1 to Deep Categorization 1.0

The IAB model is no longer going to be included as a predefined model in the Text Classification API. Instead, an improved version of this model, IAB 2.0, is provided with the Deep Categorization API. This migration is only needed if you are currently using the IAB model and wish to keep doing so.

Again, migrating the integration with an API has two parts: adapting the request and the response the API returns. Let’s see the changes.

The following table contains the most relevant changes in the request:

Text Classification 1.1 Deep Categorization 1.0
Endpoint https://api.meaningcloud.com/class-1.1 https://api.meaningcloud.com/deepcategorization-1.0
Parameter of Values supported: json/xml Values supported: json
Parameter title Text sent as title to the classification. Does not exist. Should be included with the rest of the content to analyze.
Parameter categories Does not exist.

All the other parameters from Text Classification 1.1 not explicitly mentioned, behave exactly the same in Deep Categorization 1.0.

The response does not change much either. The only change is in the term_list field:

Text Classification 1.1 (IAB) Deep Categorization 1.0 (IAB 2.0)
category_list: [
    {
       code: "Food&Drink>DiningOut",
       label: "Food & Drink>Dining Out",
       abs_relevance: "2",
       relevance: "100",
        term_list: [
            {
                form: "restaurant",
                abs_relevance: "2"
            }
        ]
    }
category_list: [
    {
        code: "Food&Drink>DiningOut",
        label: "Food and Drink>Dining Out",
        abs_relevance: "2",
        relevance: "100",
        term_list: [
            {
                form: "restaurant",
                abs_relevance: "2",
                offset_list: [
                    {
                        inip: "19",
                        endp: "28"
                    }
                ]
            }
        ]
    }

You can read all the documentation for the Deep Categorization API here, and read about the differences in the new IAB version here.

Heads up!

Pay close attention to how credits are counted for Deep Categorization, as your consumption may increase depending on the length of the texts you are classifying!

Migrating your model to the Classification Model Customization Console 2.0

So what happens if you already have a working model defined in the old customization console? You may be wondering if you have to redefine the whole thing… Don’t worry, we’ve got you!

Migrate button

If you access any of your models, you will see a new button in the Actions section of the sidebar called “Migrate“.

When clicked, this button will show a dialog that will let you launch a process to migrate the model automatically to the new version. The process will create a new model called “[your-model-name]-Migrated” in the customization console 2.0.

This new model will contain the same information as the old one, but any rules defined for it will be translated into the new rule syntax (which is one of the most significant changes). When the process is done, you will be redirected to a report of the migration with all relevant details.

The following image shows the migration report we obtain for the example model we provide:

Migration Report

The migration process does the following:

  1. Creates a new model in the 2.0 console.
  2. Updates it with the settings in your current model that apply to the new version (including stopwords). The rest of them are set to the default values, except lemmatization, which is disabled to match the behavior of the original model more closely.
  3. Creates the categories that exist in your current model in the migrated model.
  4. Updates the new categories with the transformed information from the old ones:
    • Training text is stored in the new category as it is.
    • Rules are translated into the new syntax.

The migration report provides detailed information on how this transformation is made, so you can check out if everything is correct:

Migration detail

The only way to access the report again after leaving it is to redo the process, so we recommend downloading it to a PDF using the button in the bottom right corner.

Once your model has been migrated, you should check that it behaves as expected. As you still will have access to the previous version, you can classify the same collection of texts with both to adjust any differences that may appear. Some of the things you will need to check are:

  • Relevance values: the relevance is computed differently in this new version, so you will need to adjust the relevance thresholds defined in the settings. This especially applies to the minimum absolute relevance, which is now limited to values between 0 and 1. Anything higher than that will give you a warning in the migration process.
  • Terms definition:
    • Now that lemmatization is supported, it may help simplify your current rules.
    • Some of the new operators can help you define more accurate rules.

Regarding the limit of models provided for your plan, it will apply to each one of the consoles separately, enabling you to maintain both versions while you test them. In other words, if you have two models in your plan, you will be able to have two models in each console while both consoles exist.

If you have any questions, issues or just want to say hi, we are always available at support@meaningcloud.com!


Leave a Reply

Your email address will not be published. Required fields are marked *

*
*