Topics extraction & named entity recognition

Extract the most relevant information

Topics Extraction enables to tag names of people, places or organizations in any type of content, in order to make it more findable and linkable to other contents. Tagging this information facilitates to structure any type of unstructured information (text, audio or video) and get its semantic mark.

MeaningCloud's Topics Extraction API

This API extracts from a text the most relevant information, such as people, places, organizations or products mentioned, known as named entities. In addition, it also identifies the main concepts and many other relevant data as dates, phone numbers, money amounts or electronic addresses (URLs, emails, hashtags). These entities, concepts and values provide a semantic representation of a document, enabling to develop intelligent applications to process content in several languages. However, the analysis is not limited to the identification of a mentioned entity: through a coreferential analysis it is linked to external resources representing it, such as Wikipedia or Linked Data.

MeaningCloud identifies this information in any type of text, being it a web page, piece of news, social network content or audio and video transcript. It is able to carry out the analysis not only in different languages (multi-language), but also using a common set of types (multilingual). This hierarchy of entity types, which is known as ontology, contains more than 200 classes allowing to say, for example, that Google is an organization and a software company at the same time.

Moreover, you can also add your own dictionaries to extend MeaningCloud's capabilities of tagging entities and concepts and adapt them to a different domain or to your application's requirements. Do you need to analyze documents on biomedicine? You can incorporate the names of drugs, active ingredients or diseases to semantically analyze scientific literature.

Advantages of automatizing information extraction. Applications

The annotation of entities, as well as their classification and disambiguation, improves information retrieval, search engine positioning or the recommendation of related content. Furthermore, it is a basic task to permit the semantic information processing to extract relations or tag the sentiment associated with an entity.

Competitive intelligence

Extract the most relevant entities and concepts from any piece of news or web content to monitor trends and create business intelligence applications.

Social media analysis

Find out the subjects and the interests of your target audience in social networks. Identify the trends associated with the conversation topics.

Search and content recommendation

Tag your content or your products using the categories as an aid to navigation or to detect related content in your website.

Highlights of our Topics Extraction API

Types of entities

Not only people, places and organizations. Use an extended hierarchy with more than 200 entity types and subtypes.

Concepts extraction

Tag and group the main concepts, including the multiword ones (e.g. "financial crisis") and find out their relevance.

Quotes and other relevant data

Tag quotes or indirect speech and identify in the text whom it is ascribed to. Ideal for analyzing news and social networks. Extract also other relevant data such as dates, money amounts, or phone numbers.

Disambiguation of entities and coreference

Use contextual hints to identify which entity is mentioned and its type, if several share the same name. Do not confuse the city of Barcelona with the football team.

Multiple languages

Extract entities with a common taxonomy for Spanish, English, French, Italian and Catalan.

Wikipedia and Linked Data

Link tagged entities to Wikipedia pages or Linked Data cloud resources like Freebase and Dbpedia.