-
Notifications
You must be signed in to change notification settings - Fork 26.3k
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BertForPreTraining with NSP #6330
Comments
Hi! Supporting the NSP objective is not on our roadmap, due to the reason you've linked and because of insufficient bandwidth. However, similar to the work in #6168 for SOP, we're very open to contributions and would accept a PR adding the BERT NSP objective to the datacollators/datasets. |
Awesome, I've been working on something similar. Will open a PR, thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@choidongyeon May i ask if the work on dataset part using in BertForPreTraining APIs is finished? Any example codes like run_mlm.py (is there a run_mlm_nsp.py?) can help, looking forward to your reply, thx! |
❓ Questions & Help
Details
I am trying to train BERT from scratch following a modification of https://huggingface.co/blog/how-to-train, where I use a BertTokenizer and BertForPreTraining. The documentation for BertForPreTraining states that it has two heads on top for both pre-training processes (MLM and NSP), but the example provided only provides an example of MLM.
Based on a comment provided by @LysandreJik in a previous issue, it seems that none of the provided datasets (i.e. LineByLineTextDataset) will handle the NSP objective and this objective is excluded because the RoBERTa paper has proven that the NSP objective was not particularly helpful.
@LysandreJik additionally noted that anyone who wants to implement the NSP objective can do so by changing the dataset/training loop, and I was wondering if there were any plans to add support for NSP for the sake of completeness?
It seems that something similar to what is going on in a PR (#6168) for Albert SOP can be done. Is this correct and can anyone provide me with some guidance moving forward?
The text was updated successfully, but these errors were encountered: