UIUC CS598 DL4H Final Project 2022

Team 03

Documentation: Final report and communication with original author is available in /documentation folder

Video Presentation: https://youtu.be/Q7Fltwmcm-k

Reimplementation for paper "Improving Clinical Outcome Predictions Using Convolution over Medical Entities with Multimodal Learning"

Original paper reference: https://arxiv.org/pdf/2011.12349.pdf

  Batuhan Bardak and Mehmet Tan. 2021. Improving clinical outcome predictions using convolution over 
  medical entities with multimodal learning. Artificial Intelligence in Medicine, 117:102112.

Original code repo: https://github.com/tanlab/ConvolutionMedicalNer

Data

Publicly available MIMIC-III dataset used for all the experiments: https://physionet.org/content/mimiciii/1.4/

Computational Requirements

All experiements performed using Colab Pro with TPU/GPU and High-RAM configuration.

Total duration requried for entire code run and results using pre-processed MIMIC-III data: 40 GPU hours

Total duration including all the additional experments and ablations: 150 GPU hours

Local setup requriments and dependencies

Jupyter notebook

Python 3.7

Tensorflow v1

Usage

Clone the code to local.

https://github.com/surajbisht1809/CS598_DL4H_Project_Team03_2022Spring.git
cd CS598_DL4H_Project_Team03_2022Spring

Run MIMIC-Extract Pipeline as explained in https://github.com/MLforHealth/MIMIC_Extract.
Optionally download all_hourly_data.h5 from GCP at https://console.cloud.google.com/storage/browser/mimic_extract

Pre-requisite: physionet access and GCP account
Copy the output file of MIMIC-Extract Pipeline named all_hourly_data.h5 to data folder.
Run 01-Extract-Timseries-Features_LoS.ipnyb to extract first 24 hours timeseries features from MIMIC-Extract raw data.

Execution time: 10 mins

Added los > 5 data points

experimented data extraction for first 48 hours
Copy the ADMISSIONS.csv, NOTEEVENTS.csv, ICUSTAYS.csv files into data folder.
Run 02-Select-SubClinicalNotes_LoS.ipynb to select subnotes based on criteria from all MIMIC-III Notes.

Execution time: 5 mins

Output: sub_notes
Run 03-Prprocess-Clinical-Notes_LoS.ipnyb to prepocessing notes.

Execution time: 15 mins

Output: preprocessed_notes
Run 04-Apply-med7-on-Clinical-Notes_LoS.ipynb to extract medical entities.

Execution time: 16-20 hrs - Execution is based on for loop hence most time-consuming

Output: ner_df
Download pretrained Word2Vec & FastText embeddings into embeddings folder: https://github.com/kexinhuang12345/clinicalBERT
Run 05-Represent-Entities-With-Different-Embeddings_LoS.ipynb to convert medical entities into word representations.

Execution time: 30 mins

Output: six dictionary files for Wrod2Vec, Fasttext and Combined
Run 06-Create-Timeseries-Data_LoS.ipynb to prepare the timeseries data to fed through GRU / LSTM.

Execution time: 30 mins

Output: Training, validation and test data and ids
Run 07-Timeseries-Baseline_LoS.ipynb to run timeseries baseline model to predict 4 different clinical tasks.

Timeseries base models used: LSTM and GRU

Execution time: 4 hours

Hyperparameters: hidden unites:128 and 256 epochs:10, model patience:5, Iteration:10
Run 08-Multimodal-Baseline_LoS.ipynb to run multimodal baseline to predict 4 different clinical tasks.

Timeseries base models used: GRU with average multimodal using wordVec, fasttext and concat

Execution time: 5 hours

Hyperparameters: hidden unites:128 and 256 epochs:10, model patience:5, Iteration:10
Run 09-Proposed-Model.ipynb_LoS to run proposed model to predict 4 different clinical tasks.

Proposed model: GRU with three layers of 1D convolution for wordVec, fasttext and concat embeddings

Execution time: 8 hours

Hyperparameters: hidden unites:128 and 256 epochs:10, model patience:5, Iteration:10

Results

Results files are available in /results folder, following are summarized result tables

Baseline vs Baseline with MultiModal

Best Baseline vs Proposed Model

References

Original Paper reference: https://arxiv.org/pdf/2011.12349.pdf

Original code repo: https://github.com/tanlab/ConvolutionMedicalNer

Download the MIMIC-III dataset via https://mimic.physionet.org/

MIMIC-Extract implementation: https://github.com/MLforHealth/MIMIC_Extract

med7 implementation: https://github.com/kormilitzin/med7

Download Pre-trained Word2Vec & FastText embeddings: https://github.com/kexinhuang12345/clinicalBERT

Preprocessing Script: https://github.com/kaggarwal/ClinicalNotesICU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UIUC CS598 DL4H Final Project 2022

Team 03

Reimplementation for paper "Improving Clinical Outcome Predictions Using Convolution over Medical Entities with Multimodal Learning"

Data

Computational Requirements

Local setup requriments and dependencies

Usage

Results

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
documentation		documentation
embeddings		embeddings
experiments		experiments
results		results
01-Extract-Timeseries-Features_LoS.ipynb		01-Extract-Timeseries-Features_LoS.ipynb
02-Select-SubClinicalNotes_LoS.ipynb		02-Select-SubClinicalNotes_LoS.ipynb
03-Preprocess-Clinical-Notes_LoS.ipynb		03-Preprocess-Clinical-Notes_LoS.ipynb
04-Apply-med7-on-Clinical-Notes_LoS.ipynb		04-Apply-med7-on-Clinical-Notes_LoS.ipynb
05-Represent-Entities-With-Different-Embeddings_LoS.ipynb		05-Represent-Entities-With-Different-Embeddings_LoS.ipynb
06-Create-Timeseries-Data_LoS.ipynb		06-Create-Timeseries-Data_LoS.ipynb
07-TimeseriesBaseline_LoS.ipynb		07-TimeseriesBaseline_LoS.ipynb
08-Multimodal-Baseline_LoS.ipynb		08-Multimodal-Baseline_LoS.ipynb
09-Proposed-Model_LoS.ipynb		09-Proposed-Model_LoS.ipynb
README.md		README.md

surajbisht1809/CS598_DL4H_Project_Team03_2022Spring

Folders and files

Latest commit

History

Repository files navigation

UIUC CS598 DL4H Final Project 2022

Team 03

Reimplementation for paper "Improving Clinical Outcome Predictions Using Convolution over Medical Entities with Multimodal Learning"

Data

Computational Requirements

Local setup requriments and dependencies

Usage

Results

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages