BBQ RoBERTa Base Reproducibility Help #3

gsgoncalves · 2023-01-24T19:19:50Z

Hello,

Congratulations on this great work!

I am reaching out for pointers as I am unable to reproduce the accuracy results from the paper while using RoBERTa-Base.

I finetuned the RoBERTa-Base model on the RACE dataset, with the LRQA codebase. Next, I followed the instructions in the previous link to evaluate on BBQ. However, I obtained a 51.64% average accuracy across categories, which is shy of the 61.4% reported in the paper.

I used the same parameters reported in the paper:

Total Batch Size: 16 (The total batch size is simulated with a batch size of 4 and a gradient accumulation of 4 steps)
Learning Rate: 1e-5
Nr Epochs: 3
Max Token Length: 512

I am using the libraries and respective versions in the requirements.txt file.

transformers==4.5.2
tokenizers==0.10.1
datasets==1.1.2

Do you have any clues as to why I am not able to obtain the same results in terms of accuracy while running the instructions of LRQA? Any pointers would be much appreciated!

Thank you!
Gustavo

zphang · 2023-02-16T20:57:07Z

Hi, let me take a look into this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BBQ RoBERTa Base Reproducibility Help #3

BBQ RoBERTa Base Reproducibility Help #3

gsgoncalves commented Jan 24, 2023

zphang commented Feb 16, 2023

BBQ RoBERTa Base Reproducibility Help #3

BBQ RoBERTa Base Reproducibility Help #3

Comments

gsgoncalves commented Jan 24, 2023

zphang commented Feb 16, 2023