Entries

This section talks about the entries of a dictionary.

Entries are the elements that define each one of the lexical forms that are going to constitute a dictionary and complement the basic resources used by MeaningCloud's APIs. With a new dictionary you can add semantic information to entities/concepts already detected by an API or complete the resources to improve the recall of the application in specific domains.

When you access a dictionary, you will be able to add new entries just by introducing their form, which is the only required field for the element. There are two other fields that can be specified: entry type and ontology type.

First entry created

The first, entry type defines if the entry is going to be an entity or a concept. Entries with no entry type will be considered entities by default.

The difference between these two elements is directly related to the interpretation we are going to do of the data and how it relates to our ontology. In general, you could say that entities correspond to named entities (proper nouns, etc.) while concepts are keywords of the domain. With relation to the ontology, entites are instances of a type defined in it, while concepts are subclasses of a type of the ontology.

If we use the banking domain as an example, we would define "savings account" and "current account" as concepts of the domain, while specific names of products provided by different banks would be considered entities, for instance "Cuenta Nómina", "ISA" or "Cash Card Account".

Another difference between entities and concepts is how they are detected by the APIs, more specifically, the basic variants used to detect their appearances in the text. The variants associated to concepts have more to do with morphology while entities are related to common heuristics associated to the most common types of named entities.

The second field, ontology type, selects the node in the ontology the entry will belong to. By default, every entry is inside Top, the parent node of the ontology. As can be seen in the image, there are several default values that can be chosen. Those values are the most common ones from our ontology.

There's also the possibility ofwriting your own ontology node through the 'Write your own value' option inside My Types. Every node created by the user will be added as a child of the node Top. Hierarchy is represented by the character >. For instance, to define a node Characters inside the node Person, you would have to create an entry as follows:

Create entry with user defined ontology value

Once you have created an entry with your own ontology_type value, these values will be listed in the menu so you can select them when you create more entries in the dictionary. This way you will avoid having to write the new ontology_type value every time you add a new entry.

Important

These two fields are a way to edit the semantic information associated to the entry. The information saved in the dictionary is the semantic information; these fields provide shortcuts to edit easily the most used fields in it.

This is what you will see when you add a new entry:

First entry created

If one of the values specified is not correct, you will see an error message specifying why the entry cannot be created. There are two types of errors:

  • Field limitation errors, for errors in the content of the field. The only field limited is form which cannot be empty and is limited to 255 characters. Entry field limitation error Entry field size limitation error
  • Plan limitation errors, when you reach the limit of entries available in your plan. Plan limitation errors

For each entry created you will see its basic information: its form, entry type, last node of the ontology type and when it was last edited.

In the first column of the table there are two actions associated to the entry:

  • , to access the editing in the entry view and modify the entry.
  • , to delete the entry and all its contents from the dictionary.

Every entry created will be shown in this table, where you will be able to select how many entries to show in each page, order them by any of the columns and filter them dynamically by text appearance. This dynamic filtering will also include the aliases field, even though it is not shown in the summary.

To modify any field of an entry, you will have to access its editing view. From there, you can modify all of its information.

Edit entry

Both the sidebar and the main panel feature a question mark, , which shows a small tour that explains briefly each field.

The fields are divided in two sections:

  • Basic information, with basic information about the entry.
  • Semantic information, to define the semantic information associated to the entry.

The basic fields of an entry are ID, Form, Aliases and Sense ID, and only Form is mandatory when creating an entry.

  • ID: it's the identifier used to identify univocally each entry in the system. It is a unique alphanumeric code created automatically for each new entry. It is not editable.
  • Form: is the lexical form of the entry, and how it's identified when found.
  • Aliases: are alternative ways in which a text can reference an entry. Our extraction APIs already take into account the most common variants of the form in order to automatically detect aliases (for instance, if a full name is specified, appearances of just the first or last name in the text will be considered aliases of the main form; in the same way, if a common noun is added, its plurals will be considered aliases), so those will not be necessary. Each alias must be included in a new line.
  • Sense ID: is an identifier associated to the sense of the entry. The sense is the semantic information. The same form can be added as several entries with different semantic analyses, for instance, something that is both a last name and a the name of a city. By default it will be assigned the entry ID value.

    There's an additional option associated to this field that allows you to choose to use only the senses specified in the user dictionary.

    Use only my senses check

    If an entry has a sense in the user dictionary and several other senses in the basic resources provided, if the check is selected those senses from the basic resources will be ignored, and only the ones from user dictionary will be used.

    This option is coded in the dictionary by adding a dash to the sense ID. This dash will not be show in the editing interface, but it will appear in the result of exporting a dictionary.

At the bottom of the Basic information there's a section with some advanced settings that can be added to the entry. This section will appear collapsed if the default values are used.

Advanced settings fields

There are three fields available in this section:

  • Lemma: with the lemma of the entry. By default, the lemma is the form of the entry.
  • Tag: part-of-speech tag assigned to the entry. It must follow the format defined by the core engine in each language. If the tag is empty, the entry is still detected as a topic but the tokenization process is not affected, that is, multiword entries are not grouped into a single token. It's limited to 15 characters. By default, entities are tagged as proper nouns, while concepts do not have a tag.
  • Accept only the exact form: allows you to choose to search only the exact form defined instead of also its morphological variants. By defeault, it's enabled for entities and disabled for concepts.