Skip to content
/ ctb Public

Custom Taxonomy Builder

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE.md
Notifications You must be signed in to change notification settings

LHNCBC/ctb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTB - Custom Taxonomy Builder

Description

Given a list of terms and a set of UMLS files, the CTB generates a subset the of UMLS containing the supplied terms and their word-based variants.

Inputs

The following files should be placed in the data/input directory:

  • MRCONSO.RRF concepts file
  • MRSTY.RRF concept -> semantic types file

Supplied to Web Interface

  • list of supplied terms

Outputs

  • Custom version of mrconso.rrf
  • Custom version of mrsty.rrf

Usage

To use CTB you must first create indexes of your UMLS files and then start the tool.

Prepare Knowledge Sources

Copy MRCONSO.RRF, MRSTY.RRF to ctb/data/input/your data set name/.

In the ctb directory run:

bin/prepumls.sh 'your data set name'

For example:

bin/prepumls.sh 2016AA

Note: When using the GITHUB release, the name and path the standalone jar will vary based on version in the project.clj file and the version of Leiningen used, the CLASSPATH variable in the script bin/prepumls.sh must be modified to match the current location of the standalone jar (or uberjar).

Update the system configuration file

There should be a file called ctb.properties in the config directory. In ctb.properties change:

ctb.ivf.dataroot: ...

to:

ctb.ivf.dataroot: data/ivf/<your data set name>

Adding LVG to configuration file for term expansion

If you want to use the Lexical Tools Lexical Variant Generator (LVG) to supply term combinations not found in the UMLS then download LVG from the Lexical Systems Group website (https://lsg3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/web/index.html) and install it according to its directions. After installing the Lexical Tools then add the following to the ctb.properties file:

ctb.lvg.directory: {LVGDIR}

Where LVGDIR is the location of your LVG installation.

Missing directories when using GITHUB release

If you are using the GITHUB release of CTB then you will need the a directory for the output.

mkdir -p resources/public/output

Start up system

In the top-level ctb directory run:

java -jar target/ctb-0.1.3-SNAPSHOT-standalone.jar [port]

Note: When using the GITHUB release, the name and path the standalone jar will vary based on version in the project.clj file and the version of Leiningen used.

or if you have Leiningen:

lein ring server [port]

Then point your web browser to localhost:3000 (or if you supplied a port number, that port number.)

Supply Term List

Paste your term list into the "Input Terms" (first) page and press "Submit".

Filter synonyms

Select or de-select terms in Synonym Set View to filter the synonyms generated by the tool and press "Submit".

Generate Data Set

The generated dataset will be placed in the directory resources/public/output/user//.

The directory should contain the following files:

filtered-synset
filtered-termlist.edn
mrconso.rrf
mrsty.rrf
params
synonyms.checksum
termlist

For Users of the Github release

You will need both Leiningen and Maven to be installed.

Irutils 2.1 inverted file library is necessary to use the latest version of CTB. In separate directory clone, compile and install irutils version 2.1 into your local maven (and leiningen) repository:

$ git clone https://github.com/willjrogers/irutils.git
$ cd irutils/java
$ git branch rel2.1 rel-2.1
$ git checkout rel2.1
$ mkdir -p src/main
$ (cd src/main && ln -s ../../sources java)
$ mvn install

Goto The "ctb" directory and compile and package CTB:

$ cd ctb
$ lein uberjar

If the uberjar builds successfully, the steps in the usage section above should work normally.

For Developers

Running the system in Apache Tomcat

If you have tomcat you can use the file target/ctb-0.1.0-SNAPSHOT-standalone.war to deploy the system to tomcat.

The application now expects the config directory containing ctb.properties and the data directory containing the indexes to be in sub-directory war-resources before deployment using the command: lein ring uberwar.

Note: CTB has not been extensively tested in Tomcat and may require modification to work properly.

License

CTB is product of the U.S. Government and is not subject to copyright.

For more information see: http://www.usa.gov/government-works