Skip to content

Safa-98/patient-stay-analogy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring Analogy based Applications in Healthcare

This repository contains source code for paper An analogy based framework for patient-stay identification in healthcare submitted to the ICCBR-ATA 2022 Workshop and all the work carried out in the M2 thesis titled Exploring Analogy based Applications in Healthcare. In this thesis, we focus on two tasks: (1) patient-stay identification, i.e., does a hospital stay belong to a patient or not?, using our first setting, and (2) disease prognosis, i.e., will a certain disease develop in the same way in two distinct patients?, using our second and third settings. We propose a prototypical architecture that combines patient-stay representation learning and the analogical reasoning framework. We train a neural model to detect patient-stay analogies. Our models are implemented using PyTorch.

Requirements

Dataset

MIMIC-III database analyzed in the study is available on PhysioNet repository. Here are some steps to prepare for the dataset:

Installing the Dependencies

Install Anaconda (or miniconda to save storage space).

Then, create a conda environement (for example stay-analogy) and install the dependencies, using the following commands:

$ conda create --name stay-analogy python=3.9
$ conda activate stay-analogy
$ conda install -y pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch -c=conda-forge
$ conda install -y numpy scipy pandas scikit-learn
$ conda install -y tqdm gensim nltk

Usage

For the Identity setting (T1), run the following codes:

$ python3 T1_dataset.py # build the dataset of triples for the _Identity setting_
$ python3 preprocess01.py # define patient cohort, collect labels, extract temporal signals, and extract clinical notes
$ python3 preprocess02.py # run full preprocessing to obtain dictionaries
$ python3 doc2vec.py --phase train # train doc2vec model
$ python3 doc2vec.py --phase infer # infer doc2vec vectors

To train and evaluate the classification and the corresponding embedding model on structured and unstructured data, run:

$ python3 Identity/train_cnn_both.py # train classifier model together with the embedding model 
$ python3 Identity/evaluate_cnn_both.py # evaluate a classifier with the corresponding embedding model

To train and evaluate the classification and the corresponding embedding model on only unstructured data, run:

$ python3 Identity/train_cnn_con.py # train classifier model together with the embedding model 
$ python3 Identity/evaluate_cnn_con.py # evaluate a classifier with the corresponding embedding model

To train and evaluate the classification and the corresponding embedding model on only structured data

$ python3 Identity/train_cnn_demo.py # train classifier model together with the embedding model 
$ python3 Identity/evaluate_cnn_demo.py # evaluate a classifier with the corresponding embedding model

For the Identity + Sequent setting (T2), run the following codes:

$ python3 T2&T3_dataset.py # build the dataset for the _Identity + Sequent setting_
# Before using the next four commands, make sure to change the file path in the script depending on the diagnosis level you are exploring, _e.g._, processed_icd_T2, processed_cat_T2, or processed_blk_T2. Do the same for the mimic file path, _e.g._, mimic_icd_T2, mimic_cat_T2, or mimic_blk_T2.
$ python3 preprocess01.py 
$ python3 preprocess02.py # run full preprocessing to obtain dictionaries
$ python3 doc2vec.py --phase train # train doc2vec model
$ python3 doc2vec.py --phase infer # infer doc2vec vectors

To train and evaluate the classification and the corresponding embedding model on structured and unstructured data
For level 4 code, run the following:

$ python3 Identity+Sequent/T2_train_both_icd.py # train classifier model together with the embedding model 
$ python3 Identity+Sequent/T2_evaluate_both_icd.py # evaluate a classifier with the corresponding embedding model

For level 3 code, run the following:

$ python3 Identity+Sequent/T2_train_both_cat.py # train classifier model together with the embedding model 
$ python3 Identity+Sequent/T2_evaluate_both_cat.py # evaluate a classifier with the corresponding embedding model

For level 2 code, run the following:

$ python3 Identity+Sequent/T2_train_both_blk.py # train classifier model together with the embedding model 
$ python3 Identity+Sequent/T2_evaluate_both_blk.py # evaluate a classifier with the corresponding embedding model

To train and evaluate the classification and the corresponding embedding model on only unstructured data
For level 4 code, run the following:

$ python3 Identity+Sequent/T2_train_con_icd.py # train classifier model together with the embedding model 
$ python3 Identity+Sequent/T2_evaluate_con_icd.py # evaluate a classifier with the corresponding embedding model

For level 3 code, run the following:

$ python3 Identity+Sequent/T2_train_con_cat.py # train classifier model together with the embedding model 
$ python3 Identity+Sequent/T2_evaluate_con_cat.py # evaluate a classifier with the corresponding embedding model

For level 2 code, run the following:

$ python3 Identity+Sequent/T2_train_con_blk.py # train classifier model together with the embedding model 
$ python3 Identity+Sequent/T2_evaluate_con_blk.py # evaluate a classifier with the corresponding embedding model

For the Identity + Directly Sequent setting (T3), run the following codes:

$ python3 T2&T3_dataset.py # build the dataset for the _Identity + Directly Sequent setting_
# Before using the next four commands, make sure to change the file path in the script depending on the diagnosis level you are exploring, _e.g._, processed_icd_T3, processed_cat_T3, or processed_blk_T3. Do the same for the mimic file path, _e.g._, mimic_icd_T3, mimic_cat_T3, or mimic_blk_T3.
$ python3 preprocess01.py 
$ python3 preprocess02.py # run full preprocessing to obtain dictionaries
$ python3 doc2vec.py --phase train # train doc2vec model
$ python3 doc2vec.py --phase infer # infer doc2vec vectors

To train and evaluate the classification and the corresponding embedding model on structured and unstructured data
For level 4 code, run the following:

$ python3 Identity+DSequent/T3_train_both_icd.py # train classifier model together with the embedding model 
$ python3 Identity+DSequent/T3_evaluate_both_icd.py # evaluate a classifier with the corresponding embedding model

For level 3 code, run the following:

$ python3 Identity+DSequent/T3_train_both_cat.py # train classifier model together with the embedding model 
$ python3 Identity+DSequent/T3_evaluate_both_cat.py # evaluate a classifier with the corresponding embedding model

For level 2 code, run the following:

$ python3 Identity+DSequent/T3_train_both_blk.py # train classifier model together with the embedding model 
$ python3 Identity+DSequent/T3_evaluate_both_blk.py # evaluate a classifier with the corresponding embedding model

To train and evaluate the classification and the corresponding embedding model on only unstructured data
For level 4 code, run the following:

$ python3 Identity+DSequent/T3_train_con_icd.py # train classifier model together with the embedding model 
$ python3 Identity+DSequent/T3_evaluate_con_icd.py # evaluate a classifier with the corresponding embedding model

For level 3 code, run the following:

$ python3 Identity+DSequent/T3_train_con_cat.py # train classifier model together with the embedding model 
$ python3 Identity+DSequent/T3_evaluate_con_cat.py # evaluate a classifier with the corresponding embedding model

For level 2 code, run the following:

$ python3 Identity+DSequent/T3_train_con_blk.py # train classifier model together with the embedding model 
$ python3 Identity+DSequent/T3_evaluate_con_blk.py # evaluate a classifier with the corresponding embedding model

Files and Folders

  • data.py: tools to load the dataset, contains the main dataset class Task1Dataset and the data augmentation functions for the Identity setting
  • T2_data_blk.py: tools to load the dataset, contains the main dataset class Task1Dataset and the data augmentation functions for the Identity + Sequent setting in the level 2 code
  • T2_data_cat.py: tools to load the dataset, contains the main dataset class Task1Dataset and the data augmentation functions for the Identity + Sequent setting in the level 3 code
  • T2_data_icd.py: tools to load the dataset, contains the main dataset class Task1Dataset and the data augmentation functions for the Identity + Sequent setting in the level 4 code
  • T3_data_blk.py: tools to load the dataset, contains the main dataset class Task1Dataset and the data augmentation functions for the Identity + Directly Sequent setting in the level 2 code
  • T3_data_cat.py: tools to load the dataset, contains the main dataset class Task1Dataset and the data augmentation functions for the Identity + Directly Sequent setting in the level 3 code
  • T3_data_icd.py: tools to load the dataset, contains the main dataset class Task1Dataset and the data augmentation functions for the Identity + Directly Sequent setting in the level 4 code
  • analogy_classif_both.py: neural network to classify analogies for structured and unstructured data
  • analogy_classif_con.py: neural network to classify analogies for unstructured data
  • analogy_classif_demo.py: neural network to classify analogies for structured data
  • cnn_con.py: neural network to embed clinical notes
  • cnn_dem.py: neural network to embed static information
  • cnn_both.py: neural network to embed static information and clinical notes
  • utils.py: tools for the different codes
  • Identity folder: contains all training and evaluating codes for Identity setting
  • Identity+Sequent folder: contains all training and evaluating codes for Identity + Sequent setting
  • Identity+DSequent folder: contains all training and evaluating codes for Identity + Directly Sequent setting

About

Classifying patient-stay analogies in healthcare

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages