-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically choose dataset split if none provided #232
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That PR is in pretty good shape already!
I wonder if we should setup a dedicated load_dataset_from_string
method that is called before prepare_data
if a string is provided. Note that the question-answering and token classification have overwritten compute
and prepare
data so they would also need to be adapted.
Finally, it would be great to add a few tests to make sure this works and doesn't break in the future :)
… server Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
…/evaluate into hn-dataset-as-string
Hi @ola13, I actually put the original call to the parent class back in and returned the Dataset object instead, since I noticed that both the QA and token classification evaluators use columns from the dataset object in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, although I'd wait for @lvwerra the weigh in before merging :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mathemakitten for working on this! This looks good in general I have just a concern about the return of prepare_data
. It feels a bit odd to return data although we don't always use it. I suggest decoupling the loading from the preparing a bit:
data = self.load_data(data, ...)
metric_inputs, pipeline_inputs = self.prepare_inputs(data, ...)
That way each method is more specialized and we can avoid the x, y, _ =
and _, _, z =
statements.
I think this will also make overwriting prepare_inputs
easier as one can always assume it's a dataset. What do you think?
110f2a3
to
7bfa6d0
Compare
ec12464
to
c6bcf77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, generally looks good to me! Mostly left comments regarding the tests. What's left to do:
- refactor tests
- rebase
- make sure CI runs
Then we can merge :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ! a few comments:
tests/test_dataloaders.py
Outdated
e = evaluator("token-classification") | ||
|
||
# Test passing in dataset object | ||
data = load_dataset("conll2003", split="validation[:2]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are using load_dataset
in your CI, could you set the HF_UPDATE_DOWNLOAD_COUNTS=false
environment variable in your CI ? This way it won't count in the download stats on the Hub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this in the conftest already:
Lines 38 to 41 in d5ecbe4
@pytest.fixture(autouse=True) | |
def set_update_download_counts_to_false(monkeypatch): | |
# don't take tests into account when counting downloads | |
monkeypatch.setattr("evaluate.config.HF_UPDATE_DOWNLOAD_COUNTS", False) |
Do you think we need an extra one in the CI?
Edit: This should actually be datasets.config
and not evaluate.config
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the other code in config.py just slightly above this, I think evaluate.config is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to set both ideally ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
cee21f3
to
ac5153f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for moving the tests! A few things we could add to the tests:
- setup a few small dummy datasets with with different splits (only train, train/test, train/test/valid) and check if the right one is selected by the
get_dataset_split
method. - in addition to just checking if there is no error in the
test_data_loading
we could also check if the data that we get is actually the right one. I would make a differerence between test/train samples and then test if e.g. the first element is the one we expect. - if you don't mind, could you move the datasets from
evaluate-ci
toevaluate
? there's nothing in that org except the links to the module orgs and the datasets with the images for the docs. so we don't have too many evaluate orgs.
ebb1c5c
to
11e42e9
Compare
1411801
to
23bfbbd
Compare
Getting the strangest error in CI here: edit: resolved by #272 |
7ea0958
to
cfa98fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing this over the finish line! Looks great, just two small comments.
* Handle when data is passed as a string without a split selected * Cleanup * Add data_split kwarg for Evaluator.compute() * Change to use get_dataset_split_names instead of pinging the datasets server Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Cleanup * Docstring and cleanup * Incorporate changes into QA evaluator * Move SQuAD checking logic to prepare_data() * Token classification cases run * Remove debugging code * Refactor to move data object outside * Add tests + misc cleanup * Preferred order of splits * Move dev split before train split * Decouple data loading and preparation * Formatting * Remove data_split and cleanup tests * Fix docs * Cleanup and refactor tests * Datasets config attribute * Update tests to only use first example from dataset and refactor checks for load_data Co-authored-by: mathemakitten <helen@huggingface.co> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Previously,
evaluator.compute(..., data='imdb', ....)
would fail because it was returning an object of typedataset.DatasetDict
. This automatically detects a split if none is given (i.e. user passes in the dataset by name and expects the Evaluator to load it, instead of preloading it themselves).Closes #226.