BertForPreTraining with NSP #6330

choidongyeon · 2020-08-07T18:00:19Z

❓ Questions & Help

Details

I am trying to train BERT from scratch following a modification of https://huggingface.co/blog/how-to-train, where I use a BertTokenizer and BertForPreTraining. The documentation for BertForPreTraining states that it has two heads on top for both pre-training processes (MLM and NSP), but the example provided only provides an example of MLM.

Based on a comment provided by @LysandreJik in a previous issue, it seems that none of the provided datasets (i.e. LineByLineTextDataset) will handle the NSP objective and this objective is excluded because the RoBERTa paper has proven that the NSP objective was not particularly helpful.

@LysandreJik additionally noted that anyone who wants to implement the NSP objective can do so by changing the dataset/training loop, and I was wondering if there were any plans to add support for NSP for the sake of completeness?

It seems that something similar to what is going on in a PR (#6168) for Albert SOP can be done. Is this correct and can anyone provide me with some guidance moving forward?

LysandreJik · 2020-08-10T06:47:31Z

Hi! Supporting the NSP objective is not on our roadmap, due to the reason you've linked and because of insufficient bandwidth.

However, similar to the work in #6168 for SOP, we're very open to contributions and would accept a PR adding the BERT NSP objective to the datacollators/datasets.

choidongyeon · 2020-08-10T07:24:27Z

Awesome, I've been working on something similar. Will open a PR, thanks!

stale · 2020-10-10T03:29:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jetcodery · 2021-02-07T03:14:19Z

@choidongyeon May i ask if the work on dataset part using in BertForPreTraining APIs is finished? Any example codes like run_mlm.py (is there a run_mlm_nsp.py?) can help, looking forward to your reply, thx!

choidongyeon mentioned this issue Aug 10, 2020

Introduce dataset and data collator for Bert pretrain NSP #6376

Closed

stale bot added the wontfix label Oct 10, 2020

stale bot closed this as completed Oct 18, 2020

mscherrmann mentioned this issue Jan 22, 2021

Loss pooling layer parameters after Fine-tune. #8793

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BertForPreTraining with NSP #6330

BertForPreTraining with NSP #6330

choidongyeon commented Aug 7, 2020

LysandreJik commented Aug 10, 2020

choidongyeon commented Aug 10, 2020

stale bot commented Oct 10, 2020

Jetcodery commented Feb 7, 2021 •

edited

Loading

BertForPreTraining with NSP #6330

BertForPreTraining with NSP #6330

Comments

choidongyeon commented Aug 7, 2020

❓ Questions & Help

Details

LysandreJik commented Aug 10, 2020

choidongyeon commented Aug 10, 2020

stale bot commented Oct 10, 2020

Jetcodery commented Feb 7, 2021 • edited Loading

Jetcodery commented Feb 7, 2021 •

edited

Loading