Skip to content

ctakes relation extractor

Sean Finan edited this page Sep 21, 2024 · 10 revisions

The relation extractor is designed to annotation relations between certain Event, Entity and Modifier annotations.
There are currently models trained for detecting body site and severity using machine learning with a model trained on manually annotated clinical data.

Collection Readers
Annotation Engines
Utilities
Piper Files


Collection Readers

XMI Reader (3)

Reads document texts and annotations from XMI files specified in a provided list.

Source class: XMIReader
Source package: org.apache.ctakes.relationextractor.eval
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id

Parameter Description Class Required Default
files The XMI files to be loaded List Yes

Annotation Engines

Causal Relation Annotator

Annotates Causal relations in sentences.

Source class: CausesBringsAboutRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Generic Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Degree of Annotator

Annotates Degree Of relations.

Source class: DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Degree of Annotator 1

Annotates Degree Of relations in sentences containing a single entity mention of a valid degree_of type and a single modifier.

Source class: Baseline1DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.DegreeOfRelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Degree of Annotator 2

Annotates Degree Of relations between two shortest-distance entities in sentences with multiple modifiers.

Source class: Baseline2DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Degree of Annotator 3

Annotates Degree Of relations between two shortest-distance entities in sentences as long as there is no intervening modifier.

Source class: Baseline3DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Degree of Annotator 4

Annotates Degree Of relations between two entities whenever they are enclosed within the same noun phrase.

Source class: Baseline4DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.DegreeOfRelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Location of Annotator

Annotates Location Of relations.

Source class: LocationOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Location of Annotator

Annotates Location Of relations.

Source class: ThreadSafeLocationExtractor
Source package: org.apache.ctakes.relationextractor.concurrent
Parent class: org.apache.ctakes.relationextractor.ae.LocationOfRelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Location of Annotator 1

Annotates Location Of relations in sentences containing exactly two entities (where the entities are of the correct types).

Source class: Baseline1EntityMentionPairRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Location of Annotator 2

Annotates Location Of relations in sentences containing with multiple anatomical sites.

Source class: Baseline2EntityMentionPairRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Location of Annotator 3

Links each anatomical site with the closest entity of a type that's suitable for location_of, as long as there is no intervening anatomical site.

Source class: Baseline3EntityMentionPairRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Location of Annotator 4

Annotates Location Of relations between two entities whenever they are enclosed within the same noun phrase.

Source class: Baseline4EntityMentionPairRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Manages / Treats Annotator

Annotates Manages / Treats relations.

Source class: ManagesTreatsRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Generic Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Manifestation of Annotator

Annotates Manifestation Of relations.

Source class: ManifestationOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Generic Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Modifier Extractor

Annotates Modifiers and Chunks.

Source class: ModifierExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.cleartk.ml.CleartkAnnotator
Dependencies: Base Token, Sentence
Products: Identified Annotation, Chunk

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No

Thread safe Degree of Annotator

Annotates Degree Of relations.

Source class: ThreadSafeDegreeExtractor
Source package: org.apache.ctakes.relationextractor.concurrent
Parent class: org.apache.ctakes.relationextractor.ae.DegreeOfRelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No
ProbabilityOfKeepingANegativeExample probability that a negative example should be retained for training double No

Thread safe Modifier Extractor

Annotates Modifiers and Chunks.

Source class: ThreadSafeModifierExtractor
Source package: org.apache.ctakes.relationextractor.concurrent
Parent class: org.apache.ctakes.relationextractor.ae.ModifierExtractorAnnotator
Dependencies: Base Token, Sentence
Products: Identified Annotation, Chunk

Parameter Description Class Required Default
classifierFactoryClassName provides the full name of the ClassifierFactory class to be used. String No org.cleartk.ml.jar. JarClassifierFactory
dataWriterFactoryClassName provides the full name of the DataWriterFactory class to be used. String No org.cleartk.ml.jar. DefaultDataWriterFactory
isTraining determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. Boolean No

Utilities

Anafora XML Reader (Metastasis)

Reads annotations from DeepPhe schema Anafora XML files in a directory.

Source class: MetastasisAnaforaXMLReader
Source package: org.apache.ctakes.relationextractor.metastasis
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Products: Identified Annotation, Location Relation

No available configuration parameters.

Gold Annotation Copier

Copies an annotation type from the Gold view to the System view.

Source class: CopyFromGold
Source package: org.apache.ctakes.relationextractor.eval
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase

Parameter Description Class Required Default
AnnotationClasses Class[] Yes
GoldViewName String Yes

Gold Stats Calculator

Count various stats such as token and relation counts based on the gold standard data.

Source class: GoldAnnotationStatsCalculator
Source package: org.apache.ctakes.relationextractor.data
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Sentence, Base Token, Identified Annotation, Generic Relation, Location Relation, Degree Relation

No available configuration parameters.

Identified Annotation Expander

Enlarges the text span of an identified annotation based upon part of speech.

Source class: IdentifiedAnnotationExpander
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Identified Annotation

No available configuration parameters.


Piper Files

Default Relation Pipeline

Clinical Pipeline with degree-of and location-of relations.

Default Relation Pipeline

$\textcolor{gray}{\textsf{// Clinical Pipeline with degree-of and location-of relations. }}$

$\textcolor{gray}{\textsf{// Default Clinical Pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ DefaultFastPipeline

$\textcolor{gray}{\textsf{// degree-of, relation-of }}$
$\textcolor{magenta}{\textbf{load}}$ RelationSubPipe

Relation Sub Pipe

Commands and parameters to create a default relation extraction sub-pipeline.

Relation Sub Pipe

$\textcolor{gray}{\textsf{// Commands and parameters to create a default relation extraction sub-pipeline. }}$
$\textcolor{gray}{\textsf{// This is not a full pipeline. }}$

$\textcolor{gray}{\textsf{// Modifiers. Use addLogged to log start and finish of processing. There aren't default models, so set specifically }}$
$\textcolor{green}{\textbf{add}}$ ModifierExtractorAnnotator $\textcolor{purple}{\textbf{classifierJarPath}}$= $\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/modifier\_extractor/model.jar}}$

$\textcolor{gray}{\textsf{// Degree of severity, etc. }}$
$\textcolor{green}{\textbf{add}}$ DegreeOfRelationExtractorAnnotator $\textcolor{purple}{\textbf{classifierJarPath}}$= $\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/degree\_of/model.jar}}$

$\textcolor{gray}{\textsf{// Location. }}$
$\textcolor{green}{\textbf{add}}$ LocationOfRelationExtractorAnnotator $\textcolor{purple}{\textbf{classifierJarPath}}$= $\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/location\_of/model.jar}}$

Sectioned Relation Pipeline

Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations

Sectioned Relation Pipeline

$\textcolor{gray}{\textsf{// Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations }}$

$\textcolor{gray}{\textsf{// Default Clinical Pipeline with section, paragraph and list detection }}$
$\textcolor{magenta}{\textbf{load}}$ SectionedFastPipeline

$\textcolor{gray}{\textsf{// degree-of, relation-of }}$
$\textcolor{magenta}{\textbf{load}}$ RelationSubPipe

Ts Default Relation Pipeline

Thread Safe Default Clinical Pipeline with degree-of and location-of relations

Ts Default Relation Pipeline

$\textcolor{gray}{\textsf{// Thread Safe Default Clinical Pipeline with degree-of and location-of relations }}$

$\textcolor{gray}{\textsf{// Default Clinical Pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ TsDefaultFastPipeline

$\textcolor{gray}{\textsf{// degree-of, relation-of }}$
$\textcolor{magenta}{\textbf{load}}$ TsRelationSubPipe

Ts Relation Sub Pipe

Commands and parameters to create a relation extraction sub-pipeline.

Ts Relation Sub Pipe

$\textcolor{gray}{\textsf{// Commands and parameters to create a relation extraction sub-pipeline. }}$
$\textcolor{gray}{\textsf{// This is not a full pipeline. }}$

$\textcolor{gray}{\textsf{// Modifiers. Use addLogged to log start and finish of processing. There aren't default models, so set specifically }}$
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeModifierExtractor}}$ $\textcolor{purple}{\textbf{classifierJarPath}}$= $\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/modifier\_extractor/model.jar}}$

$\textcolor{gray}{\textsf{// Degree of severity, etc. }}$
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeDegreeExtractor}}$ $\textcolor{purple}{\textbf{classifierJarPath}}$= $\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/degree\_of/model.jar}}$

$\textcolor{gray}{\textsf{// Location. }}$
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeLocationExtractor}}$ $\textcolor{purple}{\textbf{classifierJarPath}}$= $\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/location\_of/model.jar}}$

Ts Sectioned Relation Pipeline

Thread Safe Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations.

Ts Sectioned Relation Pipeline

$\textcolor{gray}{\textsf{// Thread Safe Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations. }}$

$\textcolor{gray}{\textsf{// Default Clinical Pipeline with section, paragraph and list detection }}$
$\textcolor{magenta}{\textbf{load}}$ TsSectionedFastPipeline

$\textcolor{gray}{\textsf{// degree-of, relation-of }}$
$\textcolor{magenta}{\textbf{load}}$ TsRelationSubPipe

Clone this wiki locally