Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BERT vs Word2vec #362

Open
vikaschib7 opened this issue Jan 14, 2019 · 6 comments
Open

BERT vs Word2vec #362

vikaschib7 opened this issue Jan 14, 2019 · 6 comments

Comments

@vikaschib7
Copy link

Hello All,

Can you please help me out in getting similar words from the BERT model, as we do in Word2Vec?

Best regards,
Vikas

@TinaB19
Copy link

TinaB19 commented Jan 15, 2019

https://github.com/hanxiao/bert-as-service

@crapthings
Copy link

@Tina-19 is that repo just do encoding?

@TinaB19
Copy link

TinaB19 commented Jan 16, 2019

Yes, it provides fixed-length vectors for sentences using BERT that can be used instead of Word2Vec.

@vikaschib7
Copy link
Author

vikaschib7 commented Jan 23, 2019

Is it possible to see similar words in BERT like if I search for "radar sensor companies", so I can get the similar words related to above query @Tina-19 @andrewluchen @jacobdevlin-google

@HoaiDuyLe
Copy link

I think this link can help you.
#60

@apogiatzis
Copy link

One thing to realise is that word2vec provides context-free embeddings (static) whereas BERT gives contextualised embeddings (dynamic). For instance, take these two sentences "I like apples", "I like apple MacBooks". Word2vec will give the same embedding for the word apple in both sentences whereas BERT will give you a different one depending on the context.

Now coming back to your question, here is a step by step tutorial I wrote to obtain contextualised embeddings from BERT:
https://towardsdatascience.com/nlp-extract-contextualized-word-embeddings-from-bert-keras-tf-67ef29f60a7b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants