Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does vocab not appear to be medically oriented? #4

Closed
mikerossgithub opened this issue Feb 1, 2019 · 1 comment
Closed

Why does vocab not appear to be medically oriented? #4

mikerossgithub opened this issue Feb 1, 2019 · 1 comment

Comments

@mikerossgithub
Copy link

mikerossgithub commented Feb 1, 2019

I was trying to diagnose why BioBert is underperforming a standard Bert model for a PubMed task. I took a look at the vocab.txt, and noticed that the words are not what one would expect from PubMed word frequencies.

grep flix vocab.txt returns "Netflix" (46 PubMed occurrences) but not "infliximab" (>13000 PubMed occurrences)

Why is this?

@mikerossgithub
Copy link
Author

Never mind, this is addressed in naver/biobert-pretrained#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant