You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RoBERTa Corpus is a combination of multiple sources, did not perform any form of filtering?
The bookcorpus dataset alone has 74M rows, but I saw that your Roberta folder is named 20M. May I ask what rules you use to filter the final data. I hope to receive your detailed description or if it is possible to publicly disclose your Roberta training data. Thank you for your help.
The text was updated successfully, but these errors were encountered:
RoBERTa Corpus is a combination of multiple sources, did not perform any form of filtering?
The bookcorpus dataset alone has 74M rows, but I saw that your Roberta folder is named 20M. May I ask what rules you use to filter the final data. I hope to receive your detailed description or if it is possible to publicly disclose your Roberta training data. Thank you for your help.
The text was updated successfully, but these errors were encountered: