Skip to content

BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature [EMNLP 2022]

Notifications You must be signed in to change notification settings

disi-unibo-nlp/bio-reader

Repository files navigation

BioReader

Public repository accompanying the EMNLP22 long paper "BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature".

BioReader is the first retrieval-enhanced text-to-text model for biomedical natural language processing. By relying on T5 and RETRO blocks, our solution augments the input prompt by fetching and assembling relevant scientific literature chunks from a neural database with ≈60 million tokens centered on PubMed. We fine-tune and evaluate BioReader on a broad array of downstream tasks, significantly outperforming several state-of-the-art methods despite using up to 3x fewer parameters.

BioReader overview

BioReader architecture

In tandem with extensive ablation studies, we show that domain knowledge can be easily altered or supplemented to make the model generate correct predictions bypassing the retraining step and thus addressing the literature overload issue; we coin the term "zero-shot datastore".

BioReader zero-shot datastore

🔎 Paper

Read our paper

✉ Contacts

Cite

If you find BioReader helpful in your research, please cite:

@article{frisoni-etal-2022-bioreader,
  title     = {BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature},
  author    = {Giacomo, Frisoni and Miki, Mizutani and Gianluca, Moro and Lorenzo, Valgimigli},
  booktitle = {{EMNLP}},
  pages     = {1--24},
  publisher = {Association for Computational Linguistics},
  year      = {2022}
}

About

BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature [EMNLP 2022]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published