-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a dataset loader for GAD corpus #608
Comments
FYI - the issue you linked clearly highlighted this data has questionable labels as a weakly-labeled dataset. The author of the paper also said:
|
Thanks @tmabraham for looking into this. I remember about GAD's labels generating confusions when I was tracking down this dataset. This issue was created to stay consistent with the BLURB dataset. However, I think your point (along with others' concern about this dataset) is very valid. We will discuss and get back to you on this! |
Actually @ruisi-su @tmabraham , can we keep this as high priority for implementation? The only valid reason to deprioritize a dataset used in a standard benchmark is if that dataset isn’t public. More generally, as a research question, we’re interested in models trained with labels with different provenance (e.g., weakly supervised) to measure performance tradeoffs. From this perspective, datasets like these are quite valuable. |
#self-assign |
* add GAD dataset * update metadata on gad.py
Adding a Dataset
The text was updated successfully, but these errors were encountered: