Empty dev.tsv files in the GAD dataset after download #41

MaxwellWibert · 2023-08-08T16:19:25Z

I cloned the repo and ran download.sh, and found a dev.tsv file in each of the numbered folders, however each of those files was totally empty. Is there some other preprocessing script that is responsible for populating these files?

wonjininfo · 2023-08-08T18:26:27Z

Hi Maxwell,
For the GAD dataset, we chose to evaluate our model using the 10-fold cross-validation method, as it is a very small dataset. Therefore, there is no fixed division table for Train-Dev-Test nor we have dev.tsv.

Unfortunately, GAD might not be the most ideal resource for evaluating LMs. However, in the five years since the BioBERT paper was published, there have been significant efforts in creating resources for relation extraction in NLP. This has led to the availability of other relatively abundant resources for BioRE (to name a few: DrugProt, BioRED).

MaxwellWibert · 2023-08-09T15:02:14Z

Thank you for your response! We agree GAD is not ideal as an LM evaluation dataset, however, both BioBERT and its derivatives have become common benchmarks in the field, and so often we must try to recreate your original datasets. I'm afraid my institution's decision to use GAD as a benchmarking set is above my paygrade.

Maybe this is a silly question , but did you generate the 10-fold cross-validation by just looping over the 10 train.tsv files and setting the current file to be the validation set?

In other words, is the k-fold structured as follows?
first iteration 1/train.tsv for validation, 2/train.tsv through 10/train.tsv for training
...
ith iteration, i/train.tsv for validation, k/train.tsv for k !=i used for training
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty dev.tsv files in the GAD dataset after download #41

Empty dev.tsv files in the GAD dataset after download #41

MaxwellWibert commented Aug 8, 2023

wonjininfo commented Aug 8, 2023 •

edited

Loading

MaxwellWibert commented Aug 9, 2023 •

edited

Loading

Empty dev.tsv files in the GAD dataset after download #41

Empty dev.tsv files in the GAD dataset after download #41

Comments

MaxwellWibert commented Aug 8, 2023

wonjininfo commented Aug 8, 2023 • edited Loading

MaxwellWibert commented Aug 9, 2023 • edited Loading

wonjininfo commented Aug 8, 2023 •

edited

Loading

MaxwellWibert commented Aug 9, 2023 •

edited

Loading