Skip to content

core_CR

Johnsd11 edited this page Mar 6, 2023 · 5 revisions

FileTreeReader

Reads document texts from text files in a directory tree.

Parameter name Parameter description Example Values Default Mandatory
WriteBanner Write a large banner at each major step of the pipeline. false
InputDirectory Directory for all input files. true
Encoding The character encoding used by the input files. false
Extensions The extensions of the files that the collection reader will read. false
KeepCR Keep windows-format carriage return characters at line endings. This will only keep existing characters false
CRtoSpace Change windows-format CR + LF character sequences to LF + . false
PatientLevel The level in the directory hierarchy at which patient identifiers exist. Default value is 1; directly under root input directory. false
StripQuotes Replace document-enclosing quote characters with space characters. false

JdbcCollectionReader

Reads document texts from database text fields.

Parameter name Parameter description Example Values Default Mandatory
SqlStatement SQL statement to retrieve the document. true
DocTextColName Name of column from resultset that contains the document text. true
DbConnResrcName Name of external resource for database connection. true
DocIdColNames Specifies column names that will be used to form a document ID. false
DocIdDelimiter Specifies delimiter used when document ID is built. false
ValueFileResrcName Name of external resource for prepared statement value file. false

LuceneCollectionReader

Reads document texts from Lucene text fields.

Parameter name Parameter description Example Values Default Mandatory
IndexDirectory Location of lucene index true
FieldName Field to look in for document text false
MaxWords Maximum number of words to process (approximate -- actually based on characters true

TextReader

Reads document texts from text files specified in a provided list.

Parameter name Parameter description Example Values Default Mandatory
files The text files to be loaded true

XMIReader

Reads document texts and annotations from XMI files specified in a provided list.

Parameter name Parameter description Example Values Default Mandatory
files The XMI files to be loaded true

XmiTreeReader

Reads document texts and annotations from XMI files in a directory tree.

Parameter name Parameter description Example Values Default Mandatory
WriteBanner Write a large banner at each major step of the pipeline. false
InputDirectory Directory for all input files. true
Encoding The character encoding used by the input files. false
Extensions The extensions of the files that the collection reader will read. false
KeepCR Keep windows-format carriage return characters at line endings. This will only keep existing characters false
CRtoSpace Change windows-format CR + LF character sequences to LF + . false
PatientLevel The level in the directory hierarchy at which patient identifiers exist. Default value is 1; directly under root input directory. false
StripQuotes Replace document-enclosing quote characters with space characters. false
Clone this wiki locally