Rules syntax

In this section, we will introduce the specific language to create rules, which will be divided into seven sections:

After we've seen all the possible operators that can be used with the elements we can define, we will see some examples that will illustrate how they can be combined.

Please note that rules always apply in the defined context for the model. This is determined by the split sentences parameter in the model settings. If split sentences is enabled, the engine will try to match each rule against each sentence in the text.

Basic elements

There are three types of elements:

  • Simple term: The semantic unit represented by a single word, it is processed as one element by the system.
  • Parsing multiword: The semantic unit made of two or more words joined by underscores. Two or more words will have to be defined as a parsing multiword if the system considers them as a single element. You can check the elements returned by the system using the test parsing console.
  • Multiword or Literal expression (word-to-word): They convey exactly what is written between quotation marks, that is, those specific words in that same order.

The following table shows a few examples of each element:

Element Rule Example
Simple term tree

"Trees have a trunk, with supporting branches and leaves in most species."

Simple term butterfly

"Adult butterflies have large and brightly coloured wings."

Simple term play

"My son usually plays football on Sunday."

"I've never seen such a good play."

Parsing multiword El_Salvador

"El Salvador is the smallest and the most densely populated country in Central America."

Parsing multiword scuba_diver

"Scuba diving equipment allows you to visit the underwater world."

"The scuba diver spent the whole day practicing the new technique."

Parsing multiword vice_president

"The vice president is called the deputy president"

Literal expression (multiword) "sense of direction"

"He has no sense, he gave you the wrong direction."

"He has no sense of direction."

Literal expression (multiword) "tech|technical|product support|assistance|service"

"The service support was fantastic."

"The technical support was fantastic."

"The product offers no support."

"They offer great product support."

Literal expression (multiword) "I get lost|confused"

"I often get lost."

"You get confused easily."

"I get confused easily."

Did you notice...?

When defining literal expressions (multiwords), you can use a pipe, |, to include multiple simple terms, as seen in the last row.

Morphological aspects

1. Form and lemma

A lemma is the canonical form, dictionary form, or citation form of a set of words. In order to specify this feature in the model, use "L@" right before the term (L@term). In that way, the rule will take all possible options within that lemma into account. At the same time, homonymous words that come from different lemmas - which have different meanings – will not be considered (e.g. 'heading' includes both the gerund of the verb ‘to head” and the noun; L@heading includes the noun in its singular and plural forms, but not the verbal form).

Forms are different morphological variations of a given lemma. In order to specify this feature in the model, use "@F" right before the term (F@term). If a form is used, the rule will only work for that specific form.

Element Example
play

"I usually play football on Sunday."

"I will play football next weekend."

"He is playing football right now."

"I've never seen such a good play."

F@play

"I usually play football on Sunday."

"I will play football next weekend."

"He is playing football right now."

"I've never seen such a good play."

double_tap

"She gave me a double tap."

"I double tapped my new friend on Instagram."

F@double_tap

"She gave me a double tap."

"I double tapped my new friend on Instagram."

"Good|bad|great promotion|F@deals"

"I always get good deals."

"That was a great deal"

L@heading

"Please check the headings before publishing."

"She's heading home right now."

Important

Models assume lemma as the default unless specific form (@F) is specified. Parsing multiwords can be also used with form and lemma features, as seen in the previous examples.

2. Parts of speech

There are different grammar features that can be specified according to part of speech. This will help us disambiguate the context of the rule we are defining according to the grammatical function a specific word or words carry out in the text. To specify the grammar aspect, we just have to add right after the term @ and the letter that represents the aspect.

These are the possible features available:

Aspect Description Rule Example
@V Verb smash@V

"He heard a smash of glass."

"The thief smashed the window."

@V1 First person of the verb eat@V1

"James will eat as much as possible."

"I eat around noon."

@V2 Second person of the verb eat@V2

"They eat together on Tuesdays."

"You are eating too much."

@V3 Third person of the verb eat@V3

"You can play after you eat."

"His father is eating by himself."

@V- Non-personal forms of the verb eat@V-

"You can play after you eat."

"Eating is necessary."

@N Noun advert@N

"I have already adverted to the solar revolution."

"The adverts you see are brand new."

@A Adjective mean@A

"That doesn't mean the same thing."

"He's a very mean person."

@E Adverb "worked hard@E"

"He worked hard jobs all his life."

"He worked hard all his life."

@T Article A@T

"The next chord is A."

"A guitar is an instrument."

@D Demonstrative "@D person|woman|man"

"The woman dropped the bag."

"That man dropped the bag."

@M Numeral "@M euros|dollars"

"He has some euros left."

"He owes me 10 euros."

@P Personal pronoun "@P can_withdraw|may_withdraw"

"He never will withdraw his support."

"She can withdraw her consent."

@S Possessive "@S {family}"

"The family is coming."

"His family is coming."

@Y Preposition "@Y people"

"The people he's sharing room with are strangers."

"He feels comfortable among people of the same tastes."

@Q Quantifier "@Q car|bus|motorbike"

"Those cars are red."

"Many cars are red."

All of these features can also be used by themselves to refer to any word that works as the chosen part of speech in the text. This is an example of how this works:

"You @E run"

"You have run through that park."

"You always run there."

"You rarely run around the block."

3. Negation feature

The negation feature + or - identifies if a term is affected by negation or not.

Rule Example
+enthusiastic

"I am enthusiastic about going to the bowling alley tonight."

"I am not enthusiastic about going to the bowling alley tonight."

-enthusiastic

"I am enthusiastic about going to the bowling alley tonight."

"I am not enthusiastic about going to the bowling alley tonight."

Semantic aspects

Semantic information can be used to define rules, enabling you to group words by their meaning. Currently only the semantic information defined in MeaningCloud's default resources is supported. These are the possible values:

  • Ontology entity type: it's accessed using S@[ontology type] and searches the internal semantic attribute sementity.
  • Theme information: it's accessed using T@[theme type] and searches the internal semantic attribute semtheme.
  • Geographic information: it's accessed using G@[semgeo] and searches the internal semantic attribute semgeo.
"S@Top>Product>Food" Comprises all the words tagged as Top>Product>Food or any of its descendants.

"I buy a car."

"I buy an orange."

"T@Top>Sport" Comprises all the words tagged as Top>Sport or any of its descendants.

"Writing is fun."

"Karate is fun."

"G@America" Comprises all the words tagged as having America as their geographic information.

"He was born in Spain."

"He recently visited Peru."

For the ontology entity type, unless otherwise defined, both entities and concepts are taken into account. In order to consider just one of the two, it would be necessary to to add an additional tag to the end of the ontology type @class for concepts and @instance for entities.

Sometimes polysemy, words with more than one meaning, plays against us. For that purpose, we have the _multiX parameter. This parameter specifies wether a term has more than one meaning or not. We have three parameters:

  • Ontology entity type: _multiS implies that the term has more than one sense.
  • Theme information: _multiT implies that it has more than one theme.
  • Geographic information: _multiG implies that it has more than one geo information.

Semantic aspects are explained in more detail here.

Did you notice...?

It is possible to use any ontology type you have previously defined in your user dictionaries just by adding them here.

Logical operators

Logical operators are used to associate the appearances of different terms using Boolean logic. There are three operators available: AND, OR, and AND NOT, (brackets should be used to define their precedence in a rule). Please note that these operators apply only on the pre-established context.

Operator Definition Rule Example
AND All terms have to appear and order does not matter. network AND customer|client

"The network is down."

"The client's network is down."

OR One term or the other. It can be expressed by using pipe ("|") or the operator "OR". apartment OR flat

"That's my house."

"That's my apartment."

AND NOT Exclude terms. food AND home AND NOT restaurant

"We brought food home from the restaurant."

"We don't have any food home."

Important

The operator OR must be used when the rule includes literal expressions. Example:

  • "due date"|deadline|expiration
  • "due date" OR deadline|expiration

Brackets can be used to indicate operator precedence. Example:

  • (get OR obtain) AND money

Context operators

Context operators define the context where the rules apply.

  • Exclusion operator: The :: operator omits the following term from its preceding broader range. In other words, this operator excludes some terms from a larger list.
"S@Top>LivingThing>Animal::cat|dog|hamster" It will detect all the words that fall into the "Animal" category except those that are specified afterwards.

"Her turtle's name is Vaca."

"Her cat's name is Tortuga"

  • Number of words operator: The WORDS operator delimits the number of words that the context must have in order for the rule to apply.
"woman AND WORDS<7" The rule will apply if the context has less than seven words.

"Pretty woman walking down the street."

"Pretty woman the kind I'd like to meet."

"woman AND WORDS>7" The rule will apply if the context has more than seven words.

"Pretty woman I don't believe you, you're not the truth."

"Oh, pretty woman."

"woman AND WORDS=7" The rule will apply if the context has exactly seven words.

"Pretty woman stop awhile."

"Pretty woman talk awhile."

"Pretty woman give your smile to me."

Distance operators

Distances operators allow you to define co-appearances of several words in a specific range. They can include as many terms as you need, but the distance between the first and the last term must be specified. There are two options to be considered when incorporating distance operators:

  • LENIENT: [termA termB]~number: Terms can appear in any order with the maximum distance (ascertained in the rule) between those defined that appear first and last.
  • STRICT NEAR: [termA termB]-number. Terms must appear as dictated by the rule; in regard to, the order and the maximum distance between the first and last of the defined terms (order matters).

So, let's see how terms' count works with some examples:

Rule 1: [call|dial phone|telephone]~3

Example of lenient distance operator
  • Someone is calling to the phone.
  • I dialed the new Spanish phone number.

Rule 2: [automatic|automatically renew|renewal]-1

Example of strict distance operator
  • License will be automatically renewed.
  • License will be renewed automatically

Multiwords can also be used within square brackets to include a word-to-word expression:

Rule 1: [rate|price|tax "go down"]-3

Example of lenient distance operator
  • Prices usually go down.
  • Prices in Germany usually go down in winter.

Notice only first term of the multiword is counted to calculate the distance. However, the whole multiword must appear.

Rule 2: ["in advance" payment|pay]~3

Example of strict distance operator
  • Payment will be made in advance.
  • I did it in advance but the payment did not appear in my account.

Rule 3: [I "my money" back]~4

Example of strict distance operator
  • I need my money back as soon as possible.
  • I wish they gave me my damn money back.

Let's see some examples of these operators in action:

Rule Text Detects
[zone|space|section|area parking|child|children's|smoking|non-smoking|play|reserved]~5 The zone has a parking nearby.
[zone|space|section|area parking|child|children's|smoking|non-smoking|play|reserved]~5 The hotel parking is a space nearby.
[zone|space|section|area parking|child|children's|smoking|non-smoking|play|reserved]~5 The assigned parking is around the corner, in a space behind the hotel.
[speak fast|rapid|quick|quickly|slow]-2 He speaks fast.
[speak fast|rapid|quick|quickly|slow]-2 He speaks English very fast.
[speak fast|rapid|quick|quickly|slow]-2 He speaks very slowly.
[give|have "what I asked|ordered"]-3 They don't have what I ordered.
[give|have "what I asked|ordered"]-3 They don't have your purchase or what I ordered.

Did you notice...?

If you are not sure on how exactly to count steps in your test, you can always use the test parsing console to check how the engine tokenizes the phrase.

Regular expressions

Regular expressions are allowed. However, you will have to escape certain characters such as +, ( ), *, and % by using "\".

Below you can find some useful regular expressions:

Regex Description Rule Detects
.*_term Matches the chosen term inside a parsing multiword. .*_president The former president will visit us tomorrow.
[0-9.,]* Matches any number, including decimal and whole numbers. A space after the number may appear or not. "[0-9.,]* ?€" The bill was 30€.
The bill was 30.5€.
The bill was 30,5€.
The bill was 30,5 €.
[list of letters] Matches any of the letters inside the square brackets. apologi[sz]e Please, apologize
Please, apologise
? The letter may appear or not. behaviou?r Good behavior
Bad behaviour

Important

Question marks can be used with semantic or grammatical aspects between brackets, for example, (@N)? meaning "any noun may appear or not". By contrast, question marks cannot be combined with brackets to include or exclude a letter set: "beer batter(ed)?"

Other Elements

There are two external elements that can be used in the rules:

  • Macros: macros are referred to by their label between curly brackets, and can be used the same way a simple term would.
    {my_macro_label} AND words
  • Categories: categories can also be referred to within the rules by using their code preceded by #. This is specially useful to deal with coappearances of several categories and disambiguate depending on the context.
    #catA AND context

Now that we've seen all the different operators and expressions that can be used in a rule, let's see some examples of how they can be combined.

Rule Text Detected?
[difficult|easy to get to]~4 It's very difficult to find the station and get to the platform.
[difficult|easy to get to]~4 It's easy to get to the airport.
[zone|section|area parking|child|children's|play]~5 The bar had a zone where children could play.
[zone|section|area parking|child|children's|play]~5 You can play in the designated zone.
[speak|talk fast|rapid|quick|quickly|slow]-2 He speaks fast.
[speak|talk fast|rapid|quick|quickly|slow]-2 He speaks English really fast.
[speak|talk fast|rapid|quick|quickly|slow]-2 That rapid talk is hard to follow.
instruction|menu|configuration|process AND easy|difficult Laptop configuration was quite easy.
"impossible to_get to" OR "impossible 2 get 2" It's impossible to get to the station.
"impossible to_get to" OR "impossible 2 get 2" It's impossible 2 get 2 the station.
[obtain|achieve|reach|get "[0-9.,]* euros|dollars"]~4 AND NOT discount I got a discount
[obtain|achieve|reach|get "[0-9.,]* euros|dollars"]~4 AND NOT discount I obtained 20 euros
[play chess \?]-4 Does he play chess?
S@Top>Location>GeoPoliticalEntity>City::_multiS He lives in London
S@Top>Location>GeoPoliticalEntity>City::_multiS That London boy is a great writer

Did you notice...?

Punctuation signs liable to be used in regular expressions or in reference other elements must be escaped when used in rules, for example, [play football \?]-4 or [\#blacklivesmatter T@Top>Society>Politics]~10