Can't get word embedding #37

happypanda5 · 2019-07-07T22:56:58Z

Hi, I am trying to get a word embedding vector for BioBERT, and compare it with the word embedding vector I get from BERT.

However, I haven't been successful in running BioBERT.

I have downloaded the weights from release v1.1-pubmed and after unzipping the weights into a folder, I run the following code

`out = open('prepoutput.json', 'w')

import os

os.system('python3 "/content/biobert/extract_features.py"
--input_file= "/content/biobert/sample_text.txt"
--vocab_file= "/content/biobert_v1.1_pubmed/vocab.txt"
--bert_config_file= "/content/biobert_v1.1_pubmed/bert_config.json"
--init_checkpoint= "/content/biobert_v1.1_pubmed/model.ckpt.index"
--output_file= "/content/prepoutput.json" ')`

The output is "256" and the file "preoutput.json" is empty.

Please guide me.

Unfortunately, my attempts at converting the weights from Pytorch wasn't successful either.

jhyuklee · 2019-07-24T08:51:59Z

Hi @happypanda5,
Sorry for the late response. Maybe this comment in #23 can help.
Thanks.

futong · 2019-07-26T04:20:37Z

Hi, @jhyuklee ,
I also would like to get the word embedding. I took your advice #23 (comment) that I got all word embeddings of a sentence.
But we know the same words with different position have different contextual embeddings.
If I only get the word embedding, what should I do? Is there only one word per line as input? Or something else?
Looking forward to your reply soon.

izuna385 · 2019-07-26T04:49:26Z

I think you can try out, for example
https://github.com/huggingface/pytorch-transformers
Give vocab.txt and pytorch-converted BERTs weights, and sentences.
You can use BERT's last layer, or avarage vector of 12 + 1 layer, or something else for getting contextualized word embeddings.

jhyuklee · 2019-08-08T03:38:56Z

Hi @futong,
the extract_features.py file gives you embeddings of last 'k' layers defined by the input argument (see

biobert/extract_features.py

Line 38 in 7a3c96e

flags.DEFINE_string("layers", "-1,-2,-3,-4", "")

), and all the position/segment/wordpiece embeddings will be already included in the first layer.
Thanks.

jhyuklee closed this as completed Aug 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't get word embedding #37

Can't get word embedding #37

happypanda5 commented Jul 7, 2019

jhyuklee commented Jul 24, 2019

futong commented Jul 26, 2019

izuna385 commented Jul 26, 2019 •

edited

Loading

jhyuklee commented Aug 8, 2019

Can't get word embedding #37

Can't get word embedding #37

Comments

happypanda5 commented Jul 7, 2019

jhyuklee commented Jul 24, 2019

futong commented Jul 26, 2019

izuna385 commented Jul 26, 2019 • edited Loading

jhyuklee commented Aug 8, 2019

izuna385 commented Jul 26, 2019 •

edited

Loading