Load BioBERT weights #135

JohnGiorgi · 2019-05-18T19:49:25Z

Figure out how to load BioBERTs weights.

See these links for help.

JohnGiorgi · 2019-05-21T16:08:11Z

Documenting how I finally got this to work:

Download the latest BioBERT pre-trained models from here. This was the only model I could convert without issue.
Assuming pytorch_pretrained_bert is installed (pip install pytorch_pretrained_bert if not)

export BERT_BASE_DIR=path/to/biobert_v1.1_pubmed
pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch $BERT_BASE_DIR/model.ckpt-1000000 $BERT_BASE_DIR/bert_config.json $BERT_BASE_DIR/pytorch_model.bin

Where BERT_BASE_DIR should point to the downloaded and uncompressed BioBERT model.

Finally, place pytorch_model.bin, bert_config.json and vocab.txt (from BERT_BASE_DIR) in a folder (e.g biobert) and g-zip it

tar -cvzf biobert.gz biobert

The model can be loaded in pytorch_pretrained_bert as such

model = BertForTokenClassification.from_pretrained('path/to/biobert.gz', num_labels=num_labels)
self.tokenizer = BertTokenizer.from_pretrained('path/to/biobert.gz', do_lower_case=False)

jhyuklee · 2019-05-23T01:39:35Z

Hi, we've updated all the other BioBERT weights (v1.0) as the same format as v1.1, so it should work now.
Thank you.

JohnGiorgi · 2019-05-23T02:17:58Z

That’s great, thanks for letting me know. Is there any reason to use v1.0 if I just want the best performance possible? Or should I stick with v1.1?

jhyuklee · 2019-05-23T03:38:18Z

For most tasks, it will be better to stick with with v1.1, but v1.0 (+PubMed 200K +PMC 270K) works well, too as shown in the paper (only minor differences). Note that we haven't updated our paper with performance of v1.1 (will take some time).
If performance on a single targeted task matters, you can compare them and choose what to use.

JohnGiorgi · 2019-05-23T12:21:05Z

Right. Okay great, thanks for the response!

Colelyman · 2019-10-01T17:33:16Z

Thanks for sharing what worked for you. I followed the steps provided and everything worked except I discovered (as of writing) that when compressing the files together they can't be in a directory, they just have to be flat.

phaniram-sayapaneni · 2020-08-05T00:46:26Z

Hi, we've updated all the other BioBERT weights (v1.0) as the same format as v1.1, so it should work now.
Thank you.

Hi @jhyuklee , the download files[BioBERT-Base v1.1 (+ PubMed 1M] donot contain .ckpt file, it has : model.ckpt-1000000.data-00000-of-00001, model.ckpt-1000000.index, model.ckpt-1000000.meta

which one is an accurate checkpoint file?
when I try load weights from model.ckpt-1000000.data-00000-of-00001 --> tf.train.list_variables('model.ckpt-1000000.data-00000-of-00001') it is throwing me error:

DataLossError: Unable to open table file biobert_v1.1_pubmed/model.ckpt-1000000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

need to port the bio-bert to pytorch, to be able to compare with other SOTA/research models

phaniram-sayapaneni · 2020-08-05T22:56:30Z

Hi @JohnGiorgi , I tried the steps you mentioned, but this error while importing the gz file:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

any hints?

JohnGiorgi · 2020-08-06T14:53:20Z

Hi @phaniram-sayapaneni,

Are you simply looking to load BioBERT with HF Transformers? If so, you can follow this code: https://huggingface.co/monologg/biobert_v1.1_pubmed.

If you search BioBERT here you can see several varients and how to load them.

JohnGiorgi added enhancement New feature or request feature labels May 18, 2019

JohnGiorgi self-assigned this May 18, 2019

This was referenced May 21, 2019

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code dmis-lab/biobert#26

Closed

BioBERT #138

Merged

JohnGiorgi closed this as completed May 23, 2019

nikhilsid mentioned this issue Aug 5, 2019

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code huggingface/transformers#457

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load BioBERT weights #135

Load BioBERT weights #135

JohnGiorgi commented May 18, 2019 •

edited

Loading

JohnGiorgi commented May 21, 2019 •

edited

Loading

jhyuklee commented May 23, 2019

JohnGiorgi commented May 23, 2019

jhyuklee commented May 23, 2019

JohnGiorgi commented May 23, 2019

Colelyman commented Oct 1, 2019

phaniram-sayapaneni commented Aug 5, 2020 •

edited

Loading

phaniram-sayapaneni commented Aug 5, 2020

JohnGiorgi commented Aug 6, 2020

Load BioBERT weights #135

Load BioBERT weights #135

Comments

JohnGiorgi commented May 18, 2019 • edited Loading

JohnGiorgi commented May 21, 2019 • edited Loading

jhyuklee commented May 23, 2019

JohnGiorgi commented May 23, 2019

jhyuklee commented May 23, 2019

JohnGiorgi commented May 23, 2019

Colelyman commented Oct 1, 2019

phaniram-sayapaneni commented Aug 5, 2020 • edited Loading

phaniram-sayapaneni commented Aug 5, 2020

JohnGiorgi commented Aug 6, 2020

JohnGiorgi commented May 18, 2019 •

edited

Loading

JohnGiorgi commented May 21, 2019 •

edited

Loading

phaniram-sayapaneni commented Aug 5, 2020 •

edited

Loading