Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically choose dataset split if none provided #232

Merged
merged 25 commits into from
Aug 26, 2022

Conversation

mathemakitten
Copy link
Contributor

Previously, evaluator.compute(..., data='imdb', ....) would fail because it was returning an object of type dataset.DatasetDict. This automatically detects a split if none is given (i.e. user passes in the dataset by name and expects the Evaluator to load it, instead of preloading it themselves).

Closes #226.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 4, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That PR is in pretty good shape already!

I wonder if we should setup a dedicated load_dataset_from_string method that is called before prepare_data if a string is provided. Note that the question-answering and token classification have overwritten compute and prepare data so they would also need to be adapted.

Finally, it would be great to add a few tests to make sure this works and doesn't break in the future :)

src/evaluate/evaluator/base.py Outdated Show resolved Hide resolved
src/evaluate/evaluator/base.py Outdated Show resolved Hide resolved
src/evaluate/evaluator/utils.py Outdated Show resolved Hide resolved
@mathemakitten
Copy link
Contributor Author

Hi @ola13, I actually put the original call to the parent class back in and returned the Dataset object instead, since I noticed that both the QA and token classification evaluators use columns from the dataset object in .compute() later. Let me know what you think!

src/evaluate/evaluator/utils.py Show resolved Hide resolved
src/evaluate/evaluator/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ola13 ola13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, although I'd wait for @lvwerra the weigh in before merging :)

src/evaluate/evaluator/utils.py Show resolved Hide resolved
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mathemakitten for working on this! This looks good in general I have just a concern about the return of prepare_data. It feels a bit odd to return data although we don't always use it. I suggest decoupling the loading from the preparing a bit:

data = self.load_data(data, ...)
metric_inputs, pipeline_inputs = self.prepare_inputs(data, ...)

That way each method is more specialized and we can avoid the x, y, _ = and _, _, z = statements.

I think this will also make overwriting prepare_inputs easier as one can always assume it's a dataset. What do you think?

@mathemakitten mathemakitten force-pushed the hn-dataset-as-string branch 2 times, most recently from 110f2a3 to 7bfa6d0 Compare August 15, 2022 18:56
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, generally looks good to me! Mostly left comments regarding the tests. What's left to do:

  • refactor tests
  • rebase
  • make sure CI runs

Then we can merge :)

tests/test_dataloaders.py Outdated Show resolved Hide resolved
tests/test_dataloaders.py Outdated Show resolved Hide resolved
tests/test_dataloaders.py Outdated Show resolved Hide resolved
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ! a few comments:

src/evaluate/evaluator/base.py Outdated Show resolved Hide resolved
tests/test_dataloaders.py Outdated Show resolved Hide resolved
tests/test_dataloaders.py Outdated Show resolved Hide resolved
e = evaluator("token-classification")

# Test passing in dataset object
data = load_dataset("conll2003", split="validation[:2]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using load_dataset in your CI, could you set the HF_UPDATE_DOWNLOAD_COUNTS=false environment variable in your CI ? This way it won't count in the download stats on the Hub

Copy link
Member

@lvwerra lvwerra Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this in the conftest already:

@pytest.fixture(autouse=True)
def set_update_download_counts_to_false(monkeypatch):
# don't take tests into account when counting downloads
monkeypatch.setattr("evaluate.config.HF_UPDATE_DOWNLOAD_COUNTS", False)

Do you think we need an extra one in the CI?

Edit: This should actually be datasets.config and not evaluate.config, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the other code in config.py just slightly above this, I think evaluate.config is correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to set both ideally ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving the tests! A few things we could add to the tests:

  • setup a few small dummy datasets with with different splits (only train, train/test, train/test/valid) and check if the right one is selected by the get_dataset_split method.
  • in addition to just checking if there is no error in the test_data_loading we could also check if the data that we get is actually the right one. I would make a differerence between test/train samples and then test if e.g. the first element is the one we expect.
  • if you don't mind, could you move the datasets from evaluate-ci to evaluate? there's nothing in that org except the links to the module orgs and the datasets with the images for the docs. so we don't have too many evaluate orgs.

@mathemakitten
Copy link
Contributor Author

mathemakitten commented Aug 22, 2022

Getting the strangest error in CI here: TestQuestionAnsweringEvaluator.test_model_init is failing in pytest with AssertionError: 33.333333333333336 != 0 but running the test suite in debug mode clearly shows the correct test results as expected, and self.data shows the correct two examples as expected. It almost feels like an environment/configuration error... digging deeper.

edit: resolved by #272

Screen Shot 2022-08-22 at 3 29 55 PM

@mathemakitten mathemakitten force-pushed the hn-dataset-as-string branch 3 times, most recently from 7ea0958 to cfa98fc Compare August 24, 2022 17:22
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this over the finish line! Looks great, just two small comments.

src/evaluate/evaluator/base.py Outdated Show resolved Hide resolved
tests/test_evaluator.py Outdated Show resolved Hide resolved
@mathemakitten mathemakitten merged commit 6239112 into main Aug 26, 2022
@mathemakitten mathemakitten deleted the hn-dataset-as-string branch August 26, 2022 18:37
mathemakitten added a commit that referenced this pull request Sep 23, 2022
* Handle when data is passed as a string without a split selected

* Cleanup

* Add data_split kwarg for Evaluator.compute()

* Change to use get_dataset_split_names instead of pinging the datasets server

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Cleanup

* Docstring and cleanup

* Incorporate changes into QA evaluator

* Move SQuAD checking logic to prepare_data()

* Token classification cases run

* Remove debugging code

* Refactor to move data object outside

* Add tests + misc cleanup

* Preferred order of splits

* Move dev split before train split

* Decouple data loading and preparation

* Formatting

* Remove data_split and cleanup tests

* Fix docs

* Cleanup and refactor tests

* Datasets config attribute

* Update tests to only use first example from dataset and refactor checks for load_data

Co-authored-by: mathemakitten <helen@huggingface.co>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with evaluator when the dataset is passed as a string.
5 participants