From f3d4399e2b58e0ffff10f404e195b8e0488c7782 Mon Sep 17 00:00:00 2001 From: Jinhyuk Lee Date: Thu, 11 Apr 2019 09:53:33 +0900 Subject: [PATCH] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index f99e6b7..ce39116 100644 --- a/README.md +++ b/README.md @@ -5,9 +5,9 @@ This repository provides pre-trained weights of BioBERT, a language representati ## Downloading pre-trained weights Go to [releases](https://github.com/naver/biobert-pretrained/releases) section of this repository, and download pre-trained weights of BioBERT. We provide three combinations of pre-trained weights: BioBERT (+ PubMed), BioBERT (+ PMC), and BioBERT (+ PubMed + PMC). Pre-training was based on the [original BERT code](https://github.com/google-research/bert) provided by Google, and training details are described in our paper. Currently available versions of pre-trained weights are as follows: -* **BioBERT v1.0 (+ PubMed 200K)** - based on BERT-base (same vocabulary) -* **BioBERT v1.0 (+ PMC 270K)** - based on BERT-base (same vocabulary) -* **BioBERT v1.0 (+ PubMed 200K + PMC 270K)** - based on BERT-base (same vocabulary) +* **BioBERT v1.0 (+ PubMed 200K)** - based on BERT-base-Cased (same vocabulary) +* **BioBERT v1.0 (+ PMC 270K)** - based on BERT-base-Cased (same vocabulary) +* **BioBERT v1.0 (+ PubMed 200K + PMC 270K)** - based on BERT-base-Cased (same vocabulary) Make sure to specify the versions of pre-trained weights used in your works. Note that as we are using WordPiece vocabulary (`vocab.txt`) provided by Google, any new words in biomedical corpus can be represented with subwords (for instance, Leukemia => Leu + ##ke + ##mia). Building a new subword vocabulary for BioBERT could lose compatibility with the original pre-trained BERT. More details are in the closed [issue #1](https://github.com/naver/biobert-pretrained/issues/1).