Skip to content

Web-based search engine to show similar clinical documents to a user-input clinical snippet

Notifications You must be signed in to change notification settings

sociocom/prism-search

Repository files navigation

PRISM Search

ユーザが入力した医療文書に対し,PRISMアノテーションが施された医療文書群から類似するものを提示する文書検索システム

Web-based search engine to show similar clinical documents to a user-input clinical snippet

Requirements

  • Python 3.8 (could work with 3.6+ but not tested)
  • scikit-learn
  • mojimoji
  • MedNER-J
  • Flask

Installation

If you use poetry, just run poetry install. Otherwise, you can install the dependencies with pip (ver. 20.0.0+) by pip install -r requirements.txt. You may want to create a virtual environment first.

You need to prepare a PRISM-annotated document source for search. We prepared preprocess.py for this purpose. Please adapt the code for the data format of your document data. The script, prepro.py, is another example for PRISM's Q3 data.

After these setups completed, you should be able to run the server with python app.py in the Flask's development mode. Be aware that, by default, the app uses the PRISM Q3 data, which requires you to modify the DATA source in app.py for your preprocessed data.

The procedure to deploy this app to a production environment depends on the web-server's setting. Please consult with the administrators.

Usage

  1. Submit a clinical document to find relevant text thereof at / (root)
  2. You will see an NER result of your input and its top 3-ranked "similar" documents at /result
  3. You can modify the similarity criteria:
    • Options to calculate similarity among clinical docs
    • Clinical NE tags to consider in similarity search

How it works

This app first apply PRISM-based clinical NER to your input document. The NER result is used for similarity calculation with a search-source documents, which are NER-ed in advance.

The current version's similarity calculation is simply based on what-is-called "bag of named entities" (BoNE). Like the "bag of words" (BoW), documents are vectorised into occurrence counts of the named entities appearing in the whole source. Then, the "similarity" among documents is calculated with the cosine-similarity measure.

This similarity calculation can be regarded as a baseline for this purpose. Further improvements could be implemented.

Development

Developed by Shuntaro Yada in Social Computing Lab. at NAIST.

Licence

To be announced.

About

Web-based search engine to show similar clinical documents to a user-input clinical snippet

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published