Domain Specific Pre-training Model #4

abhinandansrivastava · 2019-03-25T16:13:19Z

Hi,

I have run the code run_pretraining.py script on my domain specific data.

It seems like only checkpoints are saved. I have got two files 0000020.params and 0000020.states.

How can I save the model or get a model from .params and .states files in checkpoint folder so that I can use that model to get contextual embeddings.

Can someone please help me with this?

jhyuklee · 2019-03-26T00:33:44Z

Hi,

the run_pretraining.py script is exactly the same as https://github.com/google-research/bert, and you can get help from there. We used our modified version of the script (which is not shared) to handle multi-gpu and server specific issues for saving the models, so the result might be quite different from what you'll get using the original script.

Thank you.

abhinandansrivastava · 2019-03-27T10:39:20Z

Hi,
After running the BERT Model, I am getting embedding for each word in a sentence, But need to get the sentence embedding. How to find that?

Thanks

Sriyella · 2019-03-27T15:47:46Z

Hi,

Is there anyway to load this model in tensorflow hub.module()? If not, how can we use the model to get the embeddings?

Please suggest the way forward

jhyuklee · 2019-03-28T00:40:37Z

Hi, @abhinandansrivastava,
you can use [CLS] token for sentence embedding or classification. Thanks.

jhyuklee · 2019-03-28T00:42:01Z

Hi @Sriyella,
We haven't tried using hub.module(). You can just get the last layer of BERT (or BioBERT), and save them.

jhyuklee · 2019-03-28T00:44:36Z

If it's not related to pre-trained weights of BioBERT, please report BioBERT related issues in https://github.com/dmis-lab/biobert, or BERT related issues in https://github.com/google-research/bert.

pyturn · 2019-03-29T05:00:24Z

Hi,

Is there anyway to load this model in tensorflow hub.module()? If not, how can we use the model to get the embeddings?

Please suggest the way forward

I am also looking for the same. How to use the pre-trained weigths to get the embeddings.

jhyuklee · 2019-03-29T05:26:20Z

This might help!
google-research/bert#60

abhinandansrivastava · 2019-03-29T11:54:10Z

Hi @jhyuklee ,
Thanks for the reply.

Do we need to create our own vocab.txt after doing pretraining of domain specific model, as the model saved after the pretraining process does not have vocab.txt and bert_config.json file.

If yes, then how?

Thanks

jhyuklee · 2019-03-31T03:12:05Z

Hi @abhinandansrivastava,

you don't have to create your own vocab.txt if you used the same vocab.txt and bert_config.json while pre-training. See #1.

Thanks.

jhyuklee · 2019-04-01T06:59:34Z

Embedding related issues are at dmis-lab/biobert#23. Closing this issue.

abhinandansrivastava · 2019-04-04T07:29:33Z

Hi @jhyuklee ,
BioBert Vocab.txt file and Bert Uncased Vocab.txt file are different. How you have added new tokenised words into Biobert Vocab.txt file as Some Biobert Vocab.txt file has different tokenised words compared with Uncased Bert Base Vocab.txt

jhyuklee · 2019-04-04T08:31:15Z

Hi @abhinandansrivastava ,
we used Bert-base Cased vocabulary as uppercase often matters in biomedical texts. Thanks.

jhyuklee closed this as completed Apr 1, 2019

jhyuklee mentioned this issue Apr 11, 2019

Are these cased or uncased models? #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Domain Specific Pre-training Model #4

Domain Specific Pre-training Model #4

abhinandansrivastava commented Mar 25, 2019

jhyuklee commented Mar 26, 2019

abhinandansrivastava commented Mar 27, 2019 •

edited

Loading

Sriyella commented Mar 27, 2019 •

edited

Loading

jhyuklee commented Mar 28, 2019

jhyuklee commented Mar 28, 2019

jhyuklee commented Mar 28, 2019

pyturn commented Mar 29, 2019

jhyuklee commented Mar 29, 2019

abhinandansrivastava commented Mar 29, 2019

jhyuklee commented Mar 31, 2019

jhyuklee commented Apr 1, 2019

abhinandansrivastava commented Apr 4, 2019

jhyuklee commented Apr 4, 2019

Domain Specific Pre-training Model #4

Domain Specific Pre-training Model #4

Comments

abhinandansrivastava commented Mar 25, 2019

jhyuklee commented Mar 26, 2019

abhinandansrivastava commented Mar 27, 2019 • edited Loading

Sriyella commented Mar 27, 2019 • edited Loading

jhyuklee commented Mar 28, 2019

jhyuklee commented Mar 28, 2019

jhyuklee commented Mar 28, 2019

pyturn commented Mar 29, 2019

jhyuklee commented Mar 29, 2019

abhinandansrivastava commented Mar 29, 2019

jhyuklee commented Mar 31, 2019

jhyuklee commented Apr 1, 2019

abhinandansrivastava commented Apr 4, 2019

jhyuklee commented Apr 4, 2019

abhinandansrivastava commented Mar 27, 2019 •

edited

Loading

Sriyella commented Mar 27, 2019 •

edited

Loading