Biomarkers are measurable indicators of biological conditions such as biomolecules that can be used to diagnose diseases. Proteins are directly related to many diseases. Due to the direct connection between disease and proteins, these are often published in scientific journals as biomarker candidates. However, the information about which protein has already been published for which disease is not always freely and quickly available for all diseases and biomarkers. In addition, there is no identification of the proteins as biomarkers or an assessment of how established and validated these are in research. In this work, we extend the capabilitites of BIONDA database by adding a NER detection model.
Three Different approaches are tested: (Bio)BERT, (Pub)BERT and LSTM + CRF (HunFlair). The best one is used to extend the database BIONDA.