Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor kwargs and configs #188

Merged
merged 35 commits into from
Sep 22, 2022
Merged

Refactor kwargs and configs #188

merged 35 commits into from
Sep 22, 2022

Conversation

lvwerra
Copy link
Member

@lvwerra lvwerra commented Jul 18, 2022

This PR reworks the config/kwargs logic for evaluation modules (closes #169). The structure is the following:

  1. The allowed fields and their defaults are defined in a Config (a dataclass) inside the evaluation script.
  2. The defaults can be overwritten by passing them as kwargs to load.
  3. The config_name is also part of the Config and is in addition tested against the allowed_names. This field could also be used to get all allowed config names (Implement get_evaluation_module_config_names() function #138) plus the additional settings.
  4. Currently, the user can also pass the configs to compute which updates them for the duration of the call. This makes this change backward compatible but adds more ways how to changes configs. From a user perspective it might be easier to just have one way to set configs. What do you think?

To illustrate how this would work I updated the F1-score metric. Let me know what you think @lewtun and @lhoestq!

@lvwerra lvwerra requested review from lewtun and lhoestq July 18, 2022 13:52
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 18, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this feature @lvwerra - the API design looks great to me and I think it will make the evaluation UX much better!

Since the feature is backwards compatible, I don't see any problem with the current proposal - happy to review again once the PR is ready for another pass (I've just left minor comments)

class Config:
"""Base class to store the configuration used for the evaluation module."""

name = "default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment (or docstring) that explains what name and allowed_names are and how they're related?

@@ -54,6 +77,7 @@ class EvaluationModuleInfo:
streamable: bool = False
format: Optional[str] = None
module_type: str = "metric" # deprecate this in the future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but one suggestion would be to add a deprecation warning so we can alert users when this will be removed / remind ourselves to remove it :)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ! I added a few comments about the design, to try to make it more intuitive and practical

metrics/f1/f1.py Outdated
class F1Config(Config):

config_name: str = "default"
allowed_config_names: List[str] = field(default_factory=lambda: ["default", "multilabel"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this a class attribute ? And all in caps to make it clear that it's not a parameter ?

This can be moved to a class attribute of Metric instead btw

metrics/f1/f1.py Outdated
@dataclass
class F1Config(Config):

config_name: str = "default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You call it config_name here but in Config it is called name

@@ -54,6 +77,7 @@ class EvaluationModuleInfo:
streamable: bool = False
format: Optional[str] = None
module_type: str = "metric" # deprecate this in the future
config: Optional[Config] = Config()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set it to None by default IMO

metrics/f1/f1.py Outdated
@@ -114,11 +134,17 @@ def _info(self):
"references": datasets.Value("int32"),
}
),
config=F1Config(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to instantiate the default one here. Also, what if the features depend on the config, how would we access the config params from here ? Maybe _info() can take the config as input instead

And you can add the class attribute BUILDER_CLASS to instantiate the config before passing it to _info

@lvwerra
Copy link
Member Author

lvwerra commented Aug 5, 2022

Thanks for your feedback! @lhoestq I reworked the logic based on your feedback. Is that what you had in mind?

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice ! Love it this way :) more comments

metrics/f1/f1.py Outdated
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
class F1(evaluate.Metric):
def _info(self):

BUILDER_CLASS = F1Config()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant CONFIG_CLASS. And it doesn't have to be instantiated.

Suggested change
BUILDER_CLASS = F1Config()
CONFIG_CLASS = F1Config

This way you don't carry the same config for all instances in self.BUILDER_CLASS. And instead of

        self.BUILDER_CLASS.update(kwargs)
        info = self._info(self.BUILDER_CLASS)

you can do

        info = self._info(self.CONFIG_CLASS(**kwargs))

@@ -436,7 +452,12 @@ def compute(self, *, predictions=None, references=None, **kwargs) -> Optional[di

inputs = {input_name: self.data[input_name] for input_name in self._feature_names()}
with temp_seed(self.seed):
output = self._compute(**inputs, **compute_kwargs)
config_state = deepcopy(self.config)
self.config.update(compute_kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a temporary assignment here ? Otherwise calling compute twice, first with kwargs and then without, would apply the kwargs to the second call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why after the call the config is reverted: self._module_info.config = config_state in L460. Or do you see a flaw in that logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an error during _compute, then it's not going back to normal, you can use try:... finally:...

Copy link
Contributor

@dleve123 dleve123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just looking for this functionality in evaluate and stumbled upon this PR. Looks awesome, left one comment that I believe enhances the type hints. Thanks for the awesome library 👍

metrics/f1/f1.py Outdated Show resolved Hide resolved
@lvwerra lvwerra marked this pull request as ready for review September 1, 2022 07:35
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome ! LGTM :)

Do you know how users can get some docstrings about the config parameters ?
This would be useful to document IMO (can be done in a subsequent PR)

metrics/bleu/bleu.py Show resolved Hide resolved
src/evaluate/module.py Outdated Show resolved Hide resolved
@lvwerra
Copy link
Member Author

lvwerra commented Sep 1, 2022

Do you know how users can get some docstrings about the config parameters ?
This would be useful to document IMO (can be done in a subsequent PR)

The can get the configs with:

metric = evaluate.load("some_metric")
print(metric.config)

Is that what you had in mind? Or do you want a method that extends the modules docstring automatically with that information?

@lvwerra
Copy link
Member Author

lvwerra commented Sep 1, 2022

@sashavor I also refactored toxicity a bit to better fit with the other modules. Let me know if you agree with the changes (as well as all the others :))!

I suggest merging this PR when we also have some time to make PRs to the community metrics as this is a breaking change to their modules if they install from main.

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
@lhoestq
Copy link
Member

lhoestq commented Sep 1, 2022

Is that what you had in mind? Or do you want a method that extends the modules docstring automatically with that information?

It's good this way ! Maybe this can be mentioned in the docs

Copy link
Contributor

@sashavor sashavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super cool! ⭐
Makes things a lot clearer 👓

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great piece of refactoring - nice!

I left some nits on the quick tour and went through the source changes - overall it LGTM. One feature request would be to have a get_metric_config_names() function that is similar to the one used in datasets: https://huggingface.co/docs/datasets/v2.4.0/en/package_reference/loading_methods#datasets.get_dataset_config_names

This is handy when you programatically want to get all of the configs associated with a metric

docs/source/a_quick_tour.mdx Outdated Show resolved Hide resolved

```python

>>> metric = evaluate.load("accuracy", normalize=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the convention is for the evaluate docs, but it's quite handy if all the code snippets "just work", so I suggest including import evaluate

docs/source/a_quick_tour.mdx Outdated Show resolved Hide resolved
@lvwerra
Copy link
Member Author

lvwerra commented Sep 22, 2022

After a few hours of my life dedicated to finding the reason the tests suddenly fail I figured out the issue: Transformers made a release so the new example scripts associated with the new version use evaluate.load instead of datasets.load_metric. Since we execute the scripts in subprocess the patched load functions from @use_local_metrics does not work and thus the unfixed metrics from the hub are loaded.

Not sure how to easily fix these tests and it's probably a very niche use-case, so I'll merge into main which should automatically fix the issue. If not I'll revert and think a bit more about this.

@lvwerra lvwerra merged commit e4a2724 into main Sep 22, 2022
@lvwerra lvwerra deleted the config-load branch September 22, 2022 13:10
lvwerra added a commit that referenced this pull request Sep 22, 2022
lvwerra added a commit that referenced this pull request Sep 22, 2022
Revert "Refactor kwargs and configs (#188)"

This reverts commit e4a2724.
@lvwerra lvwerra restored the config-load branch September 22, 2022 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move kwargs from compute to a config pass during load
6 participants