curate-mimic - A repository to guide in demonstrating how to curate the MIMIC III database with Natural Language Processing

Preliminary steps:
- Obtain access to the MIMIC III database (ask your PI).
- Obtain a UMLS account and API key
- Install Docker and docker-compose
- If writing to a MongoDB database:
  - Create local directory: mkdir mimic_db
  - docker run --rm --name mongodb -d -v mimic_db:/data/db -p 27017:27017 mongo
Setup cTAKES containers:
- git clone git@github.com:Machine-Learning-for-Medical-Language/ctakes-rest-package.git
- cd ctakes-rest-package
- export umls_api_key=<api key from above>
- docker-compose up -d --scale ctakes=N # This starts N containers -- each requires around 4 GB RAM.
Run the python script to process the data -- run with -h to receive detailed documentation of the options:
- python process_mimic.py --input-path <path to NOTEEVENTS.csv file> --output-format <json|mongo|xmi|fhir> --output_dir <directory to write files if output format is file-based>
- If you run with the flag --max-notes N, you can run on a subset to make sure everything is working correctly before processing the whole dataset.
If you used MongoDB, check into some of the output:

$ mongo

> use mimic

> db.note_nlp.stats()['count'] # Should return number of notes processed and entered into database

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
extract_mimic_temporal.py		extract_mimic_temporal.py
process_mimic.py		process_mimic.py

Provide feedback