Topic Extraction is MeaningCloud's solution for extracting elements of relevant information from unstructured text:

  • Named entities: people, organizations, places, etc.
  • Concepts: significant keywords
  • Time and money expressions
  • Quantity expressions
  • Quotes
  • Relations

This detection process is carried out by combining a number of complex natural language processing techniques that allow us to obtain morphological, syntactic, and semantic analyses of a text and use them to identify different types of significant elements. The current supported languages are Spanish, English, French, Italian, Portuguese, Catalan and with partial coverage Danish, Swedish, Norwegian, Finnish, Chinese, Russian and Arabic.

Differentiators:

  • Because the API is highly configurable you can adjust its behavior to very diverse operating scenarios, not only to obtain exactly the type of information relevant to the user, but also to cover different source formats, languages and even language registers.
  • Recognizes names of people, organizations, and a hierarchy of 200 entity types.
  • Extracts multiword concepts (e.g. "financial crisis").
  • Disambiguates and detects co-occurrences in several languages.
  • Users can create their own dictionaries.

You can also use your own resources in the extraction process by creating a dictionary through our customization engine.

Documentation

Everything and anything you need to take advantage of this API's full potential.

Test Console

Choose an input and a configuration, and immediately check the results!

Developer Tools

Do you want to integrate this API into your environment? Check our Developer Tools!

Changelog

Version Date Status
2.0 22/September/2020

2.0.35 (22/September/2020)

  • Resources have been updated.

2.0.34 (10/September/2020)

  • Minor bugs have been fixed and resources have been updated.

2.0.33 (05/May/2020)

  • Terminology associated to the COVID-19 crisis has been added.
  • Minor bugs have been fixed and resources have been updated.

2.0.32 (12/February/2020)

  • The concept extraction algorithm has been modified to avoid false positives.
  • Minor bugs have been fixed and resources have been updated.

2.0.31 (06/February/2020)

  • Coverage for full names detected through heuristic rules has been improved in Arabic, Chinese and Russian.

2.0.30 (14/January/2020)

  • The ontology value CASHTAG (Top>ID>Cashtag) has been renamed for the more generic name TICKER, and ticker detection has been improved.

2.0.29 (18/December/2019)

  • Partial support has been added for Arabic.

2.0.28 (04/December/2019)

  • Partial support has been added for Russian.

2.0.27 (20/November/2019)

  • Partial support has been added for Chinese.

2.0.26 (24/October/2019)

  • Minor bugs have been fixed and resources have been updated.

2.0.25 (04/September/2019)

  • Minor bugs have been fixed and resources have been updated.

2.0.24 (03/April/2019)

  • Resources have been updated.

2.0.23 (13/March/2019)

  • Minor bugs have been fixed, and resources have been updated.

2.0.22 (22/January/2019)

  • Improvements added to entity detection for companies and initialisms.
  • Minor bugs have been fixed, and resources have been updated.

2.0.21 (12/December/2018)

  • Bug fix for recursive search of money expressions.
  • Improvements added to variants detection of entities.
  • Minor bugs have been fixed, and resources have been updated.

2.0.20 (06/November/2018)

  • Improvements added to money expressions detection and to overall performance when processing HTML content.
  • Minor bugs have been fixed, and resources have been updated.

2.0.19 (22/August/2018)

  • Improvements have been added to money expressions detection as well as for company detection heuristics.
  • Minor bugs have been fixed, and resources have been updated.

2.0.18 (19/July/2018)

2.0.17 (28/June/2018)

  • Minor bugs have been fixed, and resources have been updated.

2.0.16 (31/May/2018)

  • Time and quantity expressions have been refactored to follow a more coherent criteria.
  • Addresses detection has been improved, including zip code detection.
  • Minor bugs have been fixed for heuristic detection.
  • Resources have been updated.

2.0.15 (12/April/2018)

  • Resources have been updated.

2.0.14 (03/April/2018)

  • Minor bugs related to the heuristic detection of entities have been fixed.

2.0.13 (18/January/2018)

  • Minor bugs have been fixed, and resources have been updated.
  • Relevance calculations have been improved for concept and entity detection.

2.0.12 (13/November/2017)

  • Minor bugs have been fixed, and resources have been updated.

2.0.11 (19/September/2017)

  • Heuristic detection of companies has been improved.
  • Time and quantity expressions detection has been improved.
  • Several minor bugs have been fixed, and resources have been updated.

2.0.10 (26/June/2017)

  • Several minor bugs have been fixed, and resources have been updated.

2.0.9 (27/March/2017)

  • Several minor bugs have been fixed, and resources have been updated, especially keywords.
  • Heuristic detection of entities has been improved.
  • Variant generation for entity detection has become stricter in order to avoid false positives.

2.0.8 (27/October/2016)

  • Bug in specific texts with parentheses has been fixed.
  • Several minor bugs have been fixed, and resources have been updated.
  • Money expressions in Italian have been improved.
  • Heuristic detection in all languages has been improved.

2.0.7 (27/July/2016)

  • Several minor bugs have been fixed, and resources have been updated.

2.0.6 (13/June/2016)

  • Several minor bugs have been fixed, and resources have been updated.

2.0.5 (26/April/2016)

  • Several minor bugs have been fixed, and resources have been updated.

2.0.4 (07/April/2016)

  • Several minor bugs have been fixed, and resources have been updated.

2.0.3 (02/March/2016)

  • We have improved the precision on the entity type for heuristic detection.
  • Several minor bugs have been fixed, and resources have been updated.

2.0.2 (02/February/2016)

  • Several minor bugs have been fixed, and resources have been updated.

2.0.1 (22/December/2015)

  • Several minor bugs have been fixed, and resources have been updated.

2.0.0 (01/December/2015)

  • New element quantity_expression has been added.
  • uri_expression and phone_expressions have been integrated inside entity_list for greater coherence.
  • Traceability with user dictionaries has been improved.
  • The disambiguation parameters have been restructured to add clarity to what they do.
  • The possibility of specifying an interface language has been added, making it easier to work with multilingual sources.
  • The standard element has been homogenized in all its appearances.
  • Some fields in the output of money_expression and quotation have been changed to improve usability.
1.2 02/March/2016

1.2.14 (02/March/2016)

  • Version retired.

1.2.13 (22/December/2015)

  • Resources have been updated.

1.2.12 (01/December/2015)

  • Several minor bugs have been fixed, and resources have been updated.

1.2.11 (06/October/2015)

  • Several minor bugs have been fixed, and resources have been updated.

1.2.10 (09/September/2015)

  • Significant improvements have been added to URL and HTML text processing.
  • Several minor bugs have been fixed, and resources have been updated.

1.2.9 (28/July/2015)

  • Resources have been updated.

1.2.8 (14/July/2015)

  • Several minor bugs have been fixed, and resources have been updated.

1.2.7 (02/June/2015)

  • Several minor bugs have been fixed, and resources have been updated.
  • Python client has been improved.
  • The relaxed typography parameter, rt, now has a new value related to ud.

1.2.6 (18/May/2015)

  • Several minor bugs have been fixed.
  • CASHTAG has been added as a new node of the ontology.
  • Resources have been updated (including cashtag elements).
  • Memory leaks issue related to user dictionaries has been solved.
  • Smart prefix detection has been improved.
  • For English the following points have been improved:
    • Disambiguation between common and proper nouns.
    • Use of stop words depending on the typography.

1.2.5 (06/April/2015)

  • Several minor bugs and minor concurrency problems have been fixed.
  • Resources have been updated.
  • Suggestions for unknown words has been improved, especially for short words, based in typing mistakes and letters repetition.
  • Smart typography detection added.

1.2.4 (24/June/2014)

  • Several minor bugs have been fixed, and resources have been updated.

1.2.3 (20/May/2014)

  • Several minor bugs have been fixed, and resources have been updated.

1.2.2 (17/March/2014)

  • Several bugs have been fixed in entity detection and resources have been updated.
  • Response time has been improved in the documentation pages.

1.2.1 (04/February/2014)

  • Several minor bugs have been fixed, and resources have been updated.
  • Heuristic rules for entity detection have been improved, increasing the quantity and the classification quality of the unknown entities detected.

1.2 (23/September/2013)

  • Attribute naming for semantic information has been standardized so that every element that can be an array has '_list' in its name. This allows flexibility when it comes to defining new attributes and ensures that the output will always be the same regardless of the number of values the specific case has.
  • The response headers have been updated so that the content type is correct for all output formats supported.
  • Resources have been upgraded.
  • Bugs reported through our feedback section have been fixed.
  • Error messages in all APIs have been unified.
  • Related Facebook and Twitter links have been added to the semantic linked data information (semld) of known entities and concepts.
  • The documentation has been improved, both in format and contents.

Click on the version number to see the changelog.

Languages

  • English
  • Spanish
  • French
  • Italian
  • Portuguese
  • Catalan
  • Danish
  • Swedish
  • Norwegian
  • Finnish
  • Chinese
  • Russian
  • Arabic

Integrations

Related Links

Contact Us

Do you have any questions? Have you detected a bug? Contact us through our feedback section or at support@meaningcloud.com