Skip to content

Attributes

Naomi Leclercq edited this page Feb 28, 2022 · 30 revisions

Semantic Attributes

iKnow is all about identifying Entities and their context in natural language text. Entities (with the most common subtypes Concepts and Relations) are indivisible word groups that have a meaning on their own, mostly independent of the sentence they appear in. For example, in the sentence "Patient is being treated for acute pulmonary hypertension", iKnow will label the word groups "patient" and "acute pulmonary hypertension" as Concepts. These concepts are linked together by the Relation "is being treated for".

In the sentence "Patient is not being treated for acute pulmonary hypertension", the Concept "acute pulmonary hypertension" by itself still has the same meaning, but its context is clearly different. It now appears in a part of the sentence that is negated, and a sample application built with iKnow that flags problems should obviously treat the Concept differently in this sentence from the previous sentence. We capture such conceptual information through a semantic attribute. Negation is an example type of semantic attribute and detected for all languages supported by iKnow.

Semantic attributes in a sentence often boil down to the use of specific marker terms, such as the word "not" indicating our negation attribute in the example sentence. Marker terms are usually single words or combinations of words, but don't necessarily make up a full entity. Just identifying them is already somewhat helpful, but the true value of iKnow is in the way how the attributes are expanded left and right to include all the entities that are affected by the attribute. In the sentence "Patient is not being treated for acute pulmonary hypertension or CAD, but reports frequent chest pain", the Concepts "acute pulmonary hypertension" and "CAD" are affected by the negation attribute, but the Concept "frequent chest pain" is not. This attribute expansion is an important property of iKnow and a valuable capability for applications built with it, whether leveraged for just on-screen highlighting or advanced interpretation logic.

This genesis of attributes, starting from word-level markers whose effect is expanded left and right based on linguistic rules, leads to the two levels at which attribute information can be retrieved.

  • First, marker terms are annotated at the word level. As the smallest unit in iKnow is an entity, this information is associated with entity occurrences and can be retrieved for each entity occurrence (or sentence part). For a given entity occurrence, the word-level attribute information is expressed through a bit mask, a string of zeroes and ones with each position in the string representing a word in the entity and a one meaning it is a marker term. For example, the entity "is not being treated for" will have a negation bit mask "01000".
  • When attributes expand to other entities, this information is stored at the path level, because that is the logical, semantic structure of the sentence on which the language-specific expansion rules operate. Path-level negation can be expressed through the position of the first entity in the path that is affected by entity expansion and the span, the number of consecutive entities affected. For example, in the longer sample sentence quoted earlier, the starting position is 1 ("patient" is also affected) and the span is 5 (stretching till "CAD").

Semantic Attributes supported by iKnow

Semantic Attributes come in different types, that can be thought of as largely orthogonal annotations of sequences of entities in a sentence. In other words, an entity occurrence can be attributed by any number and combination of attributes, or none at all.

An overview of which attributes are available in which language models can be found on Language model guidelines.

Negation

Linguistic negation is the most straightforward attribute supported by iKnow. Negation turns an affirmative statement into its opposite, a denial. Attribute markers for negation are mostly universal across domains and included in the language models. They include terms such as "no", "not", "without", etc.

Examination of the engines indicated [NEGATION: no evidence of a mechanical defect or ingestion of birds or ice.]
The [NEGATION: patient did not demonstrate a real cardiovascular response.]

In rare cases where domain-specific or otherwise new negation markers would need to be recognised by iKnow, you can apply a User Dictionary to which additional marker terms have been added. To remove the negation attribute from built-in negation markers, these can be added to the User Dictionary with label UDIgnoreNegation.

Positive & Negative Sentiment

Sentiment expresses a person's subjective feeling about someone or something which, in terms of iKnow artefacts, translates easily into entities being affected by this positive or negative sentiment. iKnow models sentiment through two independent attribute types: positive sentiment and negative sentiment.

Sentiment marker terms are highly dependent on the kind of texts being analysed. For example, in a customer satisfaction survey context, the following terms might be flagged with a sentiment attribute:

  • The words "disgust", "terrible", "unhappy", "worried" convey a negative sentiment.
  • The words "awesome", "beautiful", "good", "enjoyed" convey a positive sentiment.

Because sentiment terms are typically highly specific to the subject area of the source texts, most iKnow language models currently do not include any sentiment marker terms by default. You can flag individual words as marker terms having a positive sentiment or a negative sentiment attribute through a User Dictionary, in the same way as described in the section on Negation. However, the English language model does contain a limited set of built-in markers of different grammatical categories to illustrate sentiment attribute spans. To find out which words are defined as markers, look in lexreps.csv for ENPosSentiment and ENNegSentiment. To switch off this default sentiment detection, activate rule 2546 and remove "+ENPosSentiment" from rules 2561, 2562 and 2563 in rules.csv. Then rebuild the English language model. To remove the sentiment attribute from built-in sentiment markers, these can be added to the User Dictionary, either with the general label UDIgnoreSentiment, or with the specific labels UDIgnoreNegSentiment and UDIgnorePosSentiment.

For example, "liked" is specified as having a positive sentiment attribute, and "awful" is specified as having a negative sentiment attribute. This is the result when iKnow applies them to the sentence:

[I liked the shape ], but the [flashy colours were awful ].

Positive sentiment would affect "shape" and negative sentiment would affect "flashy colours".

Sentiment attributes are supported for all languages except for Japanese.

Note that because of its linguistic role, (linguistic) negation may play a role in how an attribute like sentiment should be interpreted. For example, in the sentence "the opera wasn't very good", the negation marker "wasn't" affects the positive sentiment that is triggered by the positive sentiment marker "good". Some language models reverse the sentiment of the span for sentiment markers that are negated.

Measurements, Time & Frequency

In many types of documents, rather concrete pieces of structured information are captured in natural language. For example, in the sentence "We need at least £5 billion spending on public transport alternatives.", there is a currency amount that users may want to be able to extract for further interpretation. We call these structured data elements measurements, and in addition to currency amounts they include a variety of quantified expressions such as dimensions, weight expressions, medication dosages and many more. Most concrete measurements consist of a numeric expression (possibly written in full rather than numbers) and some form of unit, which we'll all annotate together as marker terms at the word level.

In addition to annotating the raw number and unit, iKnow will also use attribute expansion rules to identify the other concepts "involved" in the measurement. Thanks to this expansion, the annotated sequence of concepts fully expresses what is being measured, rather than just capturing the measurement alone. For example, in the above example sentence, this expansion will include the concept "public transport alternatives". These expanded attributes are useful in both directions: on the one hand, they allow a user to quickly bring up all the measurable facts in a document through highlighting or displaying them in a list format. In the other direction, if a user is looking for all mentions of a concept like "tumour" to derive tumour size information from a potentially long report, those can now be limited to the fragments where this concept is mentioned in the context of a measurement, skipping many other less relevant occurrences.

Time, Duration and Frequency exist as three separate attributes. However, only the English language model distinguishes between the three attributes. Some models use Time for all time-related entities. The extension of the attributes to all language models is work in progress. For a state-of-affairs, see the table on Language model guidelines. Many time-related entities don't have a span.

Extra markers for Time, Frequency and Duration, and extra units for Measurements can be added through a User Dictionary. To remove the time attribute from built-in time markers, these can be added to the User Dictionary with label UDIgnoreTime. And to remove the measurement attribute from built-in units, these can be added to the User Dictionary with label UDIgnoreUnit.

Certainty

Detection of Certainty markers and their span is available in the English language model. The implementation resembles that of Negation detection.

The [CERTAINTY: patient will probably recover within 5 days], but his medication has to be continued until the end of the month.

Like for the other attributes, it's possible to define customer-specific markers in a User Dictionary. To remove the certainty attribute from built-in certainty markers, these can be added to the User Dictionary with label UDIgnoreCertainty.

More information about how to model this attribute can be found on Modeling Certainty Attribute.

Generic

In the English language model 3 Generic attributes can be defined through the User Dictionary, for semantic categories defined by the user. The expansion of these attributes is based on the Sentiment expansion rules.

An example with 'diagnosed' as user defined attribute:

[GENERIC: He was diagnosed with temporal arteritis] and treated with systemic steroids.

More information about how to model these attributes can be found on Modeling Generic Attributes.