Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gh3275: sample_missing_splits in SST-2 #3276

Merged
merged 6 commits into from
Aug 10, 2023

Conversation

plonerma
Copy link
Collaborator

See #3275

@alanakbik
Copy link
Collaborator

@plonerma this PR should make it so that GLUE datasets have the option of being loaded without sampling the test split?

I am getting an error when running this:

from flair.datasets import GLUE_SST2

# default setting: SST-2 with sampling missing splits
corpus = GLUE_SST2()
print(corpus)

# without splits
corpus = GLUE_SST2(sample_missing_splits=False)
print(corpus)

@plonerma
Copy link
Collaborator Author

plonerma commented Aug 9, 2023

My bad. I fixed the error: it should work now.

Some background:

(Most?) GLUE datasets do not have gold labels for the test split. Hence, in flair they are no treated as test-splits but as a separate split (validation) which is used separately to create a tsv file which can be submitted to the GLUE-portal.

Since the test-split is considered to be missing, by the default, flair samples a new test-split from the training data.

The naming clearly is a bit unfortunate then (as e.g. saying "SST2 does not have a test split" is wrong / a bit ambiguous), but functionality-wise imo it makes sense as the test-split is handled very differently (e.g. no final evaluation on that split). With the other GLUE datasets I implemented, I was following the path already taken in existing ones.

@alanakbik alanakbik merged commit 9ea0894 into master Aug 10, 2023
1 check passed
@alanakbik
Copy link
Collaborator

Thanks for fixing this @plonerma!

@alanakbik alanakbik deleted the GH3275-sample_missing_splits-in-corpus-subclasses branch August 10, 2023 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants