core_CR

FileTreeReader

Reads document texts from text files in a directory tree.

Parameter name	Parameter description	Mandatory
WriteBanner	Write a large banner at each major step of the pipeline.	false
InputDirectory	Directory for all input files.	true
Encoding	The character encoding used by the input files.	false
Extensions	The extensions of the files that the collection reader will read.	false
KeepCR	Keep windows-format carriage return characters at line endings. This will only keep existing characters	false
CRtoSpace	Change windows-format CR + LF character sequences to LF + .	false
PatientLevel	The level in the directory hierarchy at which patient identifiers exist. Default value is 1; directly under root input directory.	false
StripQuotes	Replace document-enclosing quote characters with space characters.	false

Reads document texts from database text fields.

Parameter name	Parameter description	Mandatory
SqlStatement	SQL statement to retrieve the document.	true
DocTextColName	Name of column from resultset that contains the document text.	true
DbConnResrcName	Name of external resource for database connection.	true
DocIdColNames	Specifies column names that will be used to form a document ID.	false
DocIdDelimiter	Specifies delimiter used when document ID is built.	false
ValueFileResrcName	Name of external resource for prepared statement value file.	false

Reads document texts from Lucene text fields.

Parameter name	Parameter description	Mandatory
IndexDirectory	Location of lucene index	true
FieldName	Field to look in for document text	false
MaxWords	Maximum number of words to process (approximate -- actually based on characters	true

Reads document texts from text files specified in a provided list.

Parameter name	Parameter description	Example Values	Default	Mandatory
files	The text files to be loaded			true

Reads document texts and annotations from XMI files specified in a provided list.

Parameter name	Parameter description	Example Values	Default	Mandatory
files	The XMI files to be loaded			true

Reads document texts and annotations from XMI files in a directory tree.

Parameter name	Parameter description	Mandatory
WriteBanner	Write a large banner at each major step of the pipeline.	false
InputDirectory	Directory for all input files.	true
Encoding	The character encoding used by the input files.	false
Extensions	The extensions of the files that the collection reader will read.	false
KeepCR	Keep windows-format carriage return characters at line endings. This will only keep existing characters	false
CRtoSpace	Change windows-format CR + LF character sequences to LF + .	false
PatientLevel	The level in the directory hierarchy at which patient identifiers exist. Default value is 1; directly under root input directory.	false
StripQuotes	Replace document-enclosing quote characters with space characters.	false