Topics Extraction 2.0: Migration guide

We have released a new version of our information extraction API, Topics Extraction. In Topics Extraction 2.0:

  • The topics extracted have been reordered to extract information in a more coherent way.
  • Configuration options have been changed to provide flexibility in the analyses and to make the options available more understandable.
  • We’ve refactored our code with two short-term goals in mind:
    • Improving the quality concepts/keywords extraction.
    • Making easier and more flexible the use of user dictionaries.
  • A new element has been added, quantity expressions to cover a specific type of information that was hard to obtain with previous versions.
  • Some fields at the output have been modified, either to give them more appropriate names or to make them easier to use and understand.
  • A configurable interface language has been added to improve multilingual analyses.

All these improvements mean the migration process is not as fast as it would be with a minor version. In this post, we explain what you need to know to migrate your applications from Topics Extraction 1.2 to Topics Extraction 2.0.

Request

The most relevant changes in the request are the following:

Topics Extraction 1.2 Topics Extraction 2.0
Endpoint http://api.meaningcloud.com/topics-1.2 http://api.meaningcloud.com/topics-2.0
Parameter ilang Did not exist. It has the same values available as lang.
Parameter tt Accepted values:

  • e: named entities
  • c: concepts
  • t: time expressions
  • m: money expressions
  • u: uri expressions
  • p: phone expressions
  • o: other expressions
  • q: quotations
  • r: relations
  • a: all
Accepted values:

  • e: named entities
  • c: concepts
  • t: time expressions
  • m: money expressions
  • n: quantity expressions
  • o: other expressions
  • q: quotations
  • r: relations
  • a: all
Parameter dm
  • 0: no disambiguation
  • 1: morphosyntactic disambiguation mode
  • 2: basic disambiguation
  • 3: light disambiguation
  • 4: strong disambiguation
  • 5: full disambiguation
It has been divided into two parameters: dm (type of disambiguation) and sdg (grouping in case of ambiguity). These two new parameters have the following equivalences to the old dm:

Topics 1.2 Topics 2.0
dm=0 dm=n
dm=1 dm=m
dm=2 dm=s, sdg=n
dm=3 dm=s, sdg=t
dm=4 dm=s, sdg=l
dm=5 dm=s, sdg=g
Parameter cs y or n Disappears and behaves as with the default value.
Parameter dic Use of thematic dictionaries, by default they were all loaded. Disappears and behaves as it did with the default value.

All the other parameters from Topics Extraction 1.2 not explicitly mentioned behave exactly the same in Topics Extraction 2.0.

Response

The complete response is documented in detail in the documentation, so here we’ll just comment on the most important points:

  • There’s a new topic type, quantity_expressions, which allows to detect things such as percentages or other quantities expressed in the text.
  • The items shown under the topic type phone expressions are now shown in the output as entities of the Top>Id>PhoneNumber type.
  • The items shown under the topic type uri expressions are now shown in the output as entities of the Top>Id>URL and Top>Id>Email types.
  • The dictionary field now will show the name of the user dictionary it comes from, instead of just saying if it comes from a user dictionary or not.
  • The field id in entity and concept will always appear, either with the id assigned in resources or with an automatically assigned one in the cases it has been detected heuristically in the analysis.
  • The structure of the quotation_list has changed a bit. Now all the fields related to who says the quote are grouped under the same object; the same applies to the verb information.
  • The field amount inside the money_expression_list elements has changed to amount_form.

In the following table, you can see side by side how the analysis of the sentence “He said he wanted 50% of the $6 million from the robbery that occurred in London” changes between versions.

Topics 1.2
Topics 2.0
{
  "status": {
    "code": "0",
    "msg": "OK",
    "credits": "1"
  },
  "entity_list": [
    {
      "form": "London",
      "id": "01d0d69c7d",
      "sementity": {
        "class": "instance",
        "fiction": "nonfiction",
        "id": "ODENTITY_CITY",
        "type": "Top>Location>GeoPoliticalEntity>City"
      },
      "semgeo_list": [
        {
          "adm1": {
            "form": "England",
            "id": "98db781864"
          },
          "adm2": {
            "form": "Greater_London",
            "id": "ed00f6dec4"
          },
          "continent": {
            "form": "Europe",
            "id": "0404ea4d6c"
          },
          "country": {
            "form": "United_Kingdom",
            "id": "d29f412b4b",
            "ISO3166-1-a2": "GB",
            "ISO3166-1-a3": "GBR"
          }
        }
      ],
      "semld_list": [
        "od:ZW4ud2lraTpMb25kb24",
        "http://en.wikipedia.org/wiki/London",
        "http://es.wikipedia.org/wiki/Londres",
        "http://fr.wikipedia.org/wiki/Londres",
        "http://it.wikipedia.org/wiki/Londra",
        "http://ca.wikipedia.org/wiki/Londres",
        "http://pt.wikipedia.org/wiki/Londres",
        "http://zh.wikipedia.org/wiki/伦敦",
        "http://ar.wikipedia.org/wiki/لندن",
        "http://gl.wikipedia.org/wiki/Londres",
        "http://eu.wikipedia.org/wiki/Londres",
        "http://rdf.freebase.com/ns/m.04jpl",
        "http://sws.geonames.org/2643743/",
        "http://linkedgeodata.org/triplify/node107775",
        "http://data.nytimes.com/14085781296239331901",
        "http://sw.cyc.com/concept/Mx4rvVjWPJwpEbGdrcN5Y29ycA",
        "http://yago-knowledge.org/resource/London",
        "http://umbel.org/umbel/rc/Location_Underspecified",
        "http://umbel.org/umbel/rc/PopulatedPlace",
        "http://umbel.org/umbel/rc/Village",
        "@BBCLondres2012",
        "@LDN",
        "@OlimpicoCaracol",
        "@TelevisaLondres",
        "@TimeOutLondon",
        "@visitlondon",
        "sumo:City"
      ],
      "variant_list": [
        {
          "form": "London",
          "inip": "74",
          "endp": "79"
        }
      ],
      "relevance": "100"
    }
  ],
  "concept_list": [
    {
      "form": "$",
      "id": "^_9145003407816029121",
      "dictionary": "*",
      "sementity": {
        "class": "class",
        "type": "Top>Unit>Currency"
      },
      "variant_list": [
        {
          "form": "$",
          "inip": "29",
          "endp": "29"
        }
      ],
      "relevance": "100"
    },
    {
      "form": "robbery",
      "id": "c3784c490b",
      "sementity": {
        "class": "class",
        "fiction": "nonfiction",
        "id": "ODENTITY_OFFENCE",
        "type": "Top>OtherEntity>Offence"
      },
      "semld_list": [
        "sumo:Offence"
      ],
      "variant_list": [
        {
          "form": "robbery",
          "inip": "49",
          "endp": "55"
        }
      ],
      "relevance": "100"
    }
  ],
  "time_expression_list": [],
  "money_expression_list": [
    {
      "form": "the $6 million from the robbery",
      "amount": "6 million",
      "numeric_value": "6e+06",
      "currency": "USD",
      "inip": "25",
      "endp": "55"
    }
  ],
  "uri_list": [],
  "phone_expression_list": [],
  "other_expression_list": [],
  "quotation_list": [
    {
      "form": "he wanted 50% of the $6 million from the robbery that occurred in London",
      "who": "He",
      "who_lemma": "he",
      "verb": "said",
      "verb_lemma": "say",
      "inip": "8",
      "endp": "79"
    }
  ],
  "relation_list": [
    {
      "form": "He said he wanted 50% of the $6 million from the robbery that occurred in London",
      "inip": "0",
      "endp": "79",
      "subject": {
        "form": "He",
        "lemma_list": [
          "he"
        ],
        "sense_id_list": [
          "PRONHUMAN"
        ]
      },
      "verb": {
        "form": "said",
        "lemma_list": [
          "say"
        ],
        "sense_id_list": [
          "ODENTITY_COMMUNICATION_PROCESS",
          "ODENTITY_LINGUISTIC_COMMUNICATION",
          "ODENTITY_PROCESS"
        ]
      },
      "complement_list": [
        {
          "form": "he wanted 50% of the $6 million from the robbery that occurred in London",
          "type": "isDirectObject"
        }
      ],
      "degree": "1"
    },
    {
      "form": "He said he wanted 50% of the $6 million from the robbery that occurred in London",
      "inip": "8",
      "endp": "79",
      "subject": {
        "form": "he",
        "lemma_list": [
          "he"
        ],
        "sense_id_list": [
          "PRONHUMAN"
        ]
      },
      "verb": {
        "form": "wanted",
        "lemma_list": [
          "want"
        ],
        "sense_id_list": [
          "ODENTITY_INTENTIONAL_PSYCHOLOGICAL_PROCESS",
          "ODENTITY_LINGUISTIC_COMMUNICATION"
        ]
      },
      "complement_list": [
        {
          "form": "50% of the $6 million from the robbery that occurred in London",
          "type": "isDirectObject"
        }
      ],
      "degree": "1"
    },
    {
      "form": "He said he wanted 50% of the $6 million from the robbery that occurred in London",
      "inip": "57",
      "endp": "79",
      "subject": {
        "form": "50% of the $6 million from the robbery that occurred in London",
        "lemma_list": [
          "50%"
        ]
      },
      "verb": {
        "form": "occurred",
        "lemma_list": [
          "occur"
        ],
        "sense_id_list": [
          "ODENTITY_INTENTIONAL_PSYCHOLOGICAL_PROCESS",
          "ODENTITY_PROCESS"
        ]
      },
      "complement_list": [
        {
          "form": "in London",
          "type": "isLocationComplement"
        }
      ],
      "degree": "2"
    }
  ]
}
{
  "status": {
    "code": "0",
    "msg": "OK",
    "credits": "1"
  },
  "entity_list": [
    {
      "form": "London",
      "id": "01d0d69c7d",
      "sementity": {
        "class": "instance",
        "fiction": "nonfiction",
        "id": "ODENTITY_CITY",
        "type": "Top>Location>GeoPoliticalEntity>City"
      },
      "semgeo_list": [
        {
          "adm1": {
            "form": "England",
            "id": "98db781864"
          },
          "adm2": {
            "form": "Greater London",
            "id": "ed00f6dec4"
          },
          "continent": {
            "form": "Europe",
            "id": "0404ea4d6c"
          },
          "country": {
            "form": "United Kingdom",
            "id": "d29f412b4b",
            "standard_list": [
              {
                "id": "ISO3166-1-a2",
                "value": "GB"
              },
              {
                "id": "ISO3166-1-a3",
                "value": "GBR"
              }
            ]
          }
        }
      ],
      "semld_list": [
        "od:ZW4ud2lraTpMb25kb24",
        "http://en.wikipedia.org/wiki/London",
        "http://es.wikipedia.org/wiki/Londres",
        "http://fr.wikipedia.org/wiki/Londres",
        "http://it.wikipedia.org/wiki/Londra",
        "http://ca.wikipedia.org/wiki/Londres",
        "http://pt.wikipedia.org/wiki/Londres",
        "http://zh.wikipedia.org/wiki/伦敦",
        "http://ar.wikipedia.org/wiki/لندن",
        "http://gl.wikipedia.org/wiki/Londres",
        "http://eu.wikipedia.org/wiki/Londres",
        "http://rdf.freebase.com/ns/m.04jpl",
        "http://sws.geonames.org/2643743/",
        "http://linkedgeodata.org/triplify/node107775",
        "http://data.nytimes.com/14085781296239331901",
        "http://sw.cyc.com/concept/Mx4rvVjWPJwpEbGdrcN5Y29ycA",
        "http://yago-knowledge.org/resource/London",
        "http://umbel.org/umbel/rc/Location_Underspecified",
        "http://umbel.org/umbel/rc/PopulatedPlace",
        "http://umbel.org/umbel/rc/Village",
        "@BBCLondres2012",
        "@LDN",
        "@OlimpicoCaracol",
        "@TelevisaLondres",
        "@TimeOutLondon",
        "@visitlondon",
        "sumo:City"
      ],
      "variant_list": [
        {
          "form": "London",
          "inip": "74",
          "endp": "79"
        }
      ],
      "relevance": "100"
    }
  ],
  "concept_list": [
    {
      "form": "$",
      "id": "^__9145003407816029121",
      "sementity": {
        "class": "class",
        "type": "Top>Unit>Currency"
      },
      "variant_list": [
        {
          "form": "$",
          "inip": "29",
          "endp": "29"
        }
      ],
      "relevance": "100"
    },
    {
      "form": "robbery",
      "id": "c3784c490b",
      "sementity": {
        "class": "class",
        "fiction": "nonfiction",
        "id": "ODENTITY_OFFENCE",
        "type": "Top>OtherEntity>Offence"
      },
      "semld_list": [
        "sumo:Offence"
      ],
      "variant_list": [
        {
          "form": "robbery",
          "inip": "49",
          "endp": "55"
        }
      ],
      "relevance": "100"
    }
  ],
  "time_expression_list": [],
  "money_expression_list": [
    {
      "form": "the $6 million from the robbery",
      "amount_form": "6 million",
      "numeric_value": "6e+06",
      "currency": "USD",
      "inip": "25",
      "endp": "55"
    }
  ],
  "quantity_expression_list": [
    {
      "form": "50% of the $6 million from the robbery that occurred in London",
      "amount_form": "50%",
      "numeric_value": "0.5",
      "unit": "%",
      "inip": "18",
      "endp": "79"
    }
  ],
  "other_expression_list": [],
  "quotation_list": [
    {
      "form": "he wanted 50% of the $6 million from the robbery that occurred in London",
      "who": {
        "form": "He",
        "lemma": "he"
      },
      "verb": {
        "form": "said",
        "lemma": "say"
      },
      "inip": "8",
      "endp": "79"
    }
  ],
  "relation_list": [
    {
      "form": "He said he wanted 50% of the $6 million from the robbery that occurred in London",
      "inip": "0",
      "endp": "79",
      "subject": {
        "form": "He",
        "lemma_list": [
          "he"
        ],
        "sense_id_list": [
          "PRONHUMAN"
        ]
      },
      "verb": {
        "form": "said",
        "lemma_list": [
          "say"
        ],
        "sense_id_list": [
          "ODENTITY_COMMUNICATION_PROCESS",
          "ODENTITY_LINGUISTIC_COMMUNICATION",
          "ODENTITY_PROCESS"
        ]
      },
      "complement_list": [
        {
          "form": "he wanted 50% of the $6 million from the robbery that occurred in London",
          "type": "isDirectObject"
        }
      ],
      "degree": "1"
    },
    {
      "form": "He said he wanted 50% of the $6 million from the robbery that occurred in London",
      "inip": "8",
      "endp": "79",
      "subject": {
        "form": "he",
        "lemma_list": [
          "he"
        ],
        "sense_id_list": [
          "RONHUMAN"
        ]
      },
      "verb": {
        "form": "wanted",
        "lemma_list": [
          "want"
        ],
        "sense_id_list": [
          "ODENTITY_INTENTIONAL_PSYCHOLOGICAL_PROCESS",
          "ODENTITY_LINGUISTIC_COMMUNICATION"
        ]
      },
      "complement_list": [
        {
          "form": "50% of the $6 million from the robbery that occurred in London",
          "type": "isDirectObject"
        }
      ],
      "degree": "1"
    },
    {
      "form": "He said he wanted 50% of the $6 million from the robbery that occurred in London",
      "inip": "57",
      "endp": "79",
      "subject": {
        "form": "50% of the $6 million from the robbery that occurred in London",
        "lemma_list": [
          "50%"
        ]
      },
      "verb": {
        "form": "occurred",
        "lemma_list": [
          "occur"
        ],
        "sense_id_list": [
          "ODENTITY_INTENTIONAL_PSYCHOLOGICAL_PROCESS",
          "ODENTITY_PROCESS"
        ]
      },
      "complement_list": [
        {
          "form": "in London",
          "type": "isLocationComplement"
        }
      ],
      "degree": "2"
    }
  ]
}

Again, all the details can be found in the Topics Extraction 2.0 documentation. Remember! Topics Extraction 1.2 will be retired on February 29, so make sure to adapt your integration by then. If you have any questions or issues during the migration, we are always available either through our support form, or just by writing us to support@meaningcloud.com.


Leave a Reply

Your email address will not be published. Required fields are marked *

*
*