Analysis of Traditional NER and Generative LLMs for PICO Annotation in Medical Research 🔬

Bachelor Project Artificial Intelligence @ Vrije Universiteit Amsterdam 🎓

About the Research Paper 📑

Abstract: This study compares the performance of Bidirectional Long Short-Term Memory (BiLSTM) networks and Generative Pre-trained Transformers (GPT) for extracting and annotating PICO (Population, Intervention, Comparison, Outcome) elements from medical research papers. Using the EBM-NLP dataset and a secondary dataset of full-length medical papers, the BiLSTM-CRF model was trained on tokenized text sequences, while the GPT-4o model utilized prompt engineering.

Results: Results indicate that BiLSTM-CRF models excel at token-level annotation with a weighted F1-score of 0.76, whereas GPT-4o achieves a better recall of 0.89 and an F1-score of 0.82 at the abstract level.

Conclusion: The findings suggest that while BiLSTM-CRF offers superior explainability and token-level precision, GPT-4o provides greater flexibility and contextual understanding.

Future Work: Future research should explore hybrid models and validate findings on a larger dataset of full-length papers.

About the Code 👨‍💻

The repository is organized as follows:

code/: Includes the codebase for the experiments.
- main-experiment/: Code and generated data for the primary experiment comparing NLP models.
- secondary-experiment/: Code and generated data for the secondary experiment on full-length papers.
  - full-length-papers/: The manually annotated data consiting of 3 medical papers annotated in their full length.
Comparative Comparison of Traditional NER and Generative Large Language Models for PICO Extraction in Medical Research - M Bartos - Thesis.pdf: Full text of the research paper.
M Bartos - BPAI Poster.png: Scientific poster summarizing key findings.

EBM-NLP Dataset 💽

The dataset used for the main experiment can be downloaded from here.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
code		code
Comparison of Traditional NER and Generative Large Language Models for PICO Extraction in Medical Research - M Bartos - Thesis.pdf		Comparison of Traditional NER and Generative Large Language Models for PICO Extraction in Medical Research - M Bartos - Thesis.pdf
M Bartos - BPAI Poster.png		M Bartos - BPAI Poster.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of Traditional NER and Generative LLMs for PICO Annotation in Medical Research 🔬

About the Research Paper 📑

About the Code 👨‍💻

EBM-NLP Dataset 💽

About

Languages

mrkbrts/vu-bpai

Folders and files

Latest commit

History

Repository files navigation

Analysis of Traditional NER and Generative LLMs for PICO Annotation in Medical Research 🔬

About the Research Paper 📑

About the Code 👨‍💻

EBM-NLP Dataset 💽

About

Topics

Resources

Stars

Watchers

Forks

Languages