Skip to content

Releases: intersystems/iknow

Extra CRC data now available from the iKnow engine

18 Feb 16:29
Compare
Choose a tag to compare

From version 1.4 on, CRC data is now part of the indexing data produced by the engine. See the Wiki for more information.

C interface

08 Oct 15:34
Compare
Choose a tag to compare

Next to the existing Python and C++ APIs, the iKnow engine is now also available through a simple C interface. This allows embedding the iKnow engine in virtually any environment. The new API uses a simple JSON format for input and output, but otherwise has a very similar feature surface to what you're familiar with from the iknowpy package. Please see the wiki for more details.

This release also includes a significant update of the Japanese language model.

Automatic Language Identification

08 Oct 15:29
Compare
Choose a tag to compare

We've added Automatic Language Identification to the iKnow engine and made it available through a new iknowpy function.

To use ALI when indexing text, simply pass '*' as the language name to the index() method and iKnow will figure it out for you. If all you need is the language and not the rest of the indexing results, you can use the new IdentifyLanguage() method. Check the wiki for more details!

Generic Attributes & Japanese Measurements

15 Jul 14:37
38a7381
Compare
Choose a tag to compare

This release builds on the infrastructure added in v1.1 and now includes attribute labels and a basic set of rules for three Generic Attributes. These can be used by developers to tag use case specific attributes not covered by the built-in attribute types. Developers can add their own marker terms for these to leverage attribute expansion to flag syntactically "affected" portions of a sentence. A basic set of expansion rules are included for these generic attributes.
For example, we've helped customers in the healthcare industry add marker terms such as "mother", "brother", etc. so that mentions of "family history" can be identified in the text: "Patient mentioned mother suffered a stroke 10y ago, but denied experiencing chest pain himself"

Furthermore, this release includes one of the biggest extensions of the Japanese language model, significantly extending its support for Attributes including measurements and time expressions. To accommodate the nature of the language in which a single entity can include several measurements, we have enabled the measurement spans to include more than two pairs of value & unit when necessary.

See list of supported attributes for the most up-to-date information.

Improved Attributes, CI/CD, and more

28 Jun 08:57
Compare
Choose a tag to compare

This release rolls up a large number of changes applied since the first full v1.0 release:

  • Extended support for semantic attributes
  • Many improvements to the language models, especially English, Japanese and Czech
  • Enhancements to the CI/CD procedures' speed and reliability
  • Enhancements to user and developer documentation
  • Various bugfixes to previously reported issues

⚠️ the output format for sentence attributes with property values has changed slightly - see below for details

Semantic Attributes

The v1.1 release significantly expands iKnow's ability to identify semantic attributes in natural language text, and in particular enhances support for measurements, time and certainty. iKnow now recognizes more markers in the various supported languages and has more accurate expansion rules to identify the affected span within each sentence. Check the wiki for more details on which attributes are supported in which language.

New since v1.0 is the introduction of a Certainty attribute, which has an attribute property expressing the level of certainty. A level of 9 means an expression of absolute certainty and a level of 1 means very low confidence. While you can specify (or override) an initial level of certainty with the attribute marker definition (e.g. in the User Dictionary), rules processing may modify the value, e.g. in the context of a Negation Attribute.

This release also introduces three new Generic attributes, which can be used by developers to tag use case specific attributes not covered by the built-in attribute types. Developers can add their own marker terms for these to leverage attribute expansion to flag syntactically "affected" portions of a sentence. A basic set of expansion rules are included for these generic attributes.
For example, we've helped customers in the healthcare industry add marker terms such as "mother", "brother", etc. so that mentions of "family history" can be identified in the text: "Patient mentioned mother suffered a stroke 10y ago, but denied experiencing chest pain himself"

CI/CD Pipeline

The Continuous Integration / Continuous Deployment pipeline for this repository is implemented through GitHub Actions, and now includes standard unit tests as well as reference tests against a gold standard to ensure the highest quality output.

Compatibility Notes

We made a change to the Sentence attribute structure emitted by the iknowpy module. In v1.0, the fixed number of properties (value, unit, value2, unit2) has been converted to a list of pairs, enabling a more flexible way of passing sentence attribute properties:

    struct Sent_Attribute:
           Attribute type "type_"
           size_t offset_start "offset_start_", offset_stop "offset_stop_"
           string marker "marker_"
           string value "value_", unit "unit_", value2 "value2_", unit2 "unit2_"
           Entity_Ref entity_ref
           Path entity_vector

was changed to :

   ctypedef vector[pair[string, string]] Sent_Attribute_Parameters
   struct Sent_Attribute:
           Attribute type "type_"
           size_t offset_start "offset_start_", offset_stop "offset_stop_"
           string marker "marker_"
           Sent_Attribute_Parameters parameters "parameters_"
           Entity_Ref entity_ref
           Path entity_vector

Existing code should change as follows :

sent_attribute['value'] = sent_attribute['parameters'][0][0]
sent_attribute['unit'] = sent_attribute['parameters'][0][1]
sent_attribute['value2'] = sent_attribute['parameters'][1][0]
sent_attribute['unit2'] = sent_attribute['parameters'][1][1]

iKnow 1.0

26 Oct 16:38
Compare
Choose a tag to compare

First full release of the iKnow NLP library for Python:

  • Core indexing functions, identifying concepts and their context
  • Support for 11 languages (en, es, pt, fr, de, nl, sv, ja, ru, uk & cs)
  • Tuning available through the User Dictionary object
  • Full documentation, including sample sentences for all rules