Refactor kwargs and configs #188

lvwerra · 2022-07-18T13:52:25Z

This PR reworks the config/kwargs logic for evaluation modules (closes #169). The structure is the following:

The allowed fields and their defaults are defined in a Config (a dataclass) inside the evaluation script.
The defaults can be overwritten by passing them as kwargs to load.
The config_name is also part of the Config and is in addition tested against the allowed_names. This field could also be used to get all allowed config names (Implement get_evaluation_module_config_names() function #138) plus the additional settings.
Currently, the user can also pass the configs to compute which updates them for the duration of the call. This makes this change backward compatible but adds more ways how to changes configs. From a user perspective it might be easier to just have one way to set configs. What do you think?

To illustrate how this would work I updated the F1-score metric. Let me know what you think @lewtun and @lhoestq!

HuggingFaceDocBuilderDev · 2022-07-18T13:55:42Z

The documentation is not available anymore as the PR was closed or merged.

lewtun

Thanks for working on this feature @lvwerra - the API design looks great to me and I think it will make the evaluation UX much better!

Since the feature is backwards compatible, I don't see any problem with the current proposal - happy to review again once the PR is ready for another pass (I've just left minor comments)

lewtun · 2022-07-18T14:04:38Z

src/evaluate/info.py

+class Config:
+    """Base class to store the configuration used for the evaluation module."""
+
+    name = "default"


Maybe add a comment (or docstring) that explains what name and allowed_names are and how they're related?

lewtun · 2022-07-18T14:05:40Z

src/evaluate/info.py

@@ -54,6 +77,7 @@ class EvaluationModuleInfo:
    streamable: bool = False
    format: Optional[str] = None
    module_type: str = "metric"  # deprecate this in the future


Unrelated to this PR, but one suggestion would be to add a deprecation warning so we can alert users when this will be removed / remind ourselves to remove it :)

lhoestq

Cool ! I added a few comments about the design, to try to make it more intuitive and practical

lhoestq · 2022-07-29T09:51:26Z

metrics/f1/f1.py

+class F1Config(Config):
+
+    config_name: str = "default"
+    allowed_config_names: List[str] = field(default_factory=lambda: ["default", "multilabel"])


Maybe make this a class attribute ? And all in caps to make it clear that it's not a parameter ?

This can be moved to a class attribute of Metric instead btw

lhoestq · 2022-07-29T09:53:31Z

metrics/f1/f1.py

+@dataclass
+class F1Config(Config):
+
+    config_name: str = "default"


You call it config_name here but in Config it is called name

lhoestq · 2022-07-29T09:54:53Z

src/evaluate/info.py

@@ -54,6 +77,7 @@ class EvaluationModuleInfo:
    streamable: bool = False
    format: Optional[str] = None
    module_type: str = "metric"  # deprecate this in the future
+    config: Optional[Config] = Config()


You can set it to None by default IMO

lhoestq · 2022-07-29T09:59:56Z

metrics/f1/f1.py

@@ -114,11 +134,17 @@ def _info(self):
                    "references": datasets.Value("int32"),
                }
            ),
+            config=F1Config(),


It feels weird to instantiate the default one here. Also, what if the features depend on the config, how would we access the config params from here ? Maybe _info() can take the config as input instead

And you can add the class attribute BUILDER_CLASS to instantiate the config before passing it to _info

lvwerra · 2022-08-05T13:51:21Z

Thanks for your feedback! @lhoestq I reworked the logic based on your feedback. Is that what you had in mind?

lhoestq

Nice ! Love it this way :) more comments

lhoestq · 2022-08-18T14:42:07Z

metrics/f1/f1.py

 @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
 class F1(evaluate.Metric):
-    def _info(self):
+
+    BUILDER_CLASS = F1Config()


Sorry I meant CONFIG_CLASS. And it doesn't have to be instantiated.

Suggested change

BUILDER_CLASS = F1Config()

CONFIG_CLASS = F1Config

This way you don't carry the same config for all instances in self.BUILDER_CLASS. And instead of

self.BUILDER_CLASS.update(kwargs) info = self._info(self.BUILDER_CLASS)

you can do

info = self._info(self.CONFIG_CLASS(**kwargs))

lhoestq · 2022-08-18T14:44:37Z

src/evaluate/module.py

@@ -436,7 +452,12 @@ def compute(self, *, predictions=None, references=None, **kwargs) -> Optional[di

            inputs = {input_name: self.data[input_name] for input_name in self._feature_names()}
            with temp_seed(self.seed):
-                output = self._compute(**inputs, **compute_kwargs)
+                config_state = deepcopy(self.config)
+                self.config.update(compute_kwargs)


Maybe use a temporary assignment here ? Otherwise calling compute twice, first with kwargs and then without, would apply the kwargs to the second call

That's why after the call the config is reverted: self._module_info.config = config_state in L460. Or do you see a flaw in that logic?

If there's an error during _compute, then it's not going back to normal, you can use try:... finally:...

dleve123

Was just looking for this functionality in evaluate and stumbled upon this PR. Looks awesome, left one comment that I believe enhances the type hints. Thanks for the awesome library 👍

metrics/f1/f1.py

Co-authored-by: Daniel Levenson <dleve123@gmail.com>

lhoestq

Awesome ! LGTM :)

Do you know how users can get some docstrings about the config parameters ?
This would be useful to document IMO (can be done in a subsequent PR)

metrics/bleu/bleu.py

src/evaluate/module.py

lvwerra · 2022-09-01T10:21:23Z

Do you know how users can get some docstrings about the config parameters ?
This would be useful to document IMO (can be done in a subsequent PR)

The can get the configs with:

metric = evaluate.load("some_metric")
print(metric.config)

Is that what you had in mind? Or do you want a method that extends the modules docstring automatically with that information?

lvwerra · 2022-09-01T10:24:31Z

@sashavor I also refactored toxicity a bit to better fit with the other modules. Let me know if you agree with the changes (as well as all the others :))!

I suggest merging this PR when we also have some time to make PRs to the community metrics as this is a breaking change to their modules if they install from main.

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

lhoestq · 2022-09-01T10:34:56Z

Is that what you had in mind? Or do you want a method that extends the modules docstring automatically with that information?

It's good this way ! Maybe this can be mentioned in the docs

sashavor

This is super cool! ⭐
Makes things a lot clearer 👓

lewtun

This is a great piece of refactoring - nice!

I left some nits on the quick tour and went through the source changes - overall it LGTM. One feature request would be to have a get_metric_config_names() function that is similar to the one used in datasets: https://huggingface.co/docs/datasets/v2.4.0/en/package_reference/loading_methods#datasets.get_dataset_config_names

This is handy when you programatically want to get all of the configs associated with a metric

docs/source/a_quick_tour.mdx

lewtun · 2022-09-02T08:06:49Z

docs/source/a_quick_tour.mdx

+
+```python
+
+>>> metric = evaluate.load("accuracy", normalize=False)


Not sure what the convention is for the evaluate docs, but it's quite handy if all the code snippets "just work", so I suggest including import evaluate

docs/source/a_quick_tour.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

lvwerra · 2022-09-22T13:06:28Z

After a few hours of my life dedicated to finding the reason the tests suddenly fail I figured out the issue: Transformers made a release so the new example scripts associated with the new version use evaluate.load instead of datasets.load_metric. Since we execute the scripts in subprocess the patched load functions from @use_local_metrics does not work and thus the unfixed metrics from the hub are loaded.

Not sure how to easily fix these tests and it's probably a very niche use-case, so I'll merge into main which should automatically fix the issue. If not I'll revert and think a bit more about this.

This reverts commit e4a2724.

Revert "Refactor kwargs and configs (#188)" This reverts commit e4a2724.

leandro added 3 commits July 18, 2022 15:27

add Config class for all configuration of a evaluation module

e76d613

update module logic

ac8abfb

update f1 score

36e8e37

lvwerra requested review from lewtun and lhoestq July 18, 2022 13:52

lewtun reviewed Jul 18, 2022

View reviewed changes

lhoestq reviewed Jul 29, 2022

View reviewed changes

rework config logic based on reviewer feedback

c846463

lhoestq reviewed Aug 18, 2022

View reviewed changes

dleve123 reviewed Aug 18, 2022

View reviewed changes

metrics/f1/f1.py Outdated Show resolved Hide resolved

lvwerra and others added 9 commits August 19, 2022 14:15

Update metrics/f1/f1.py

aca56f5

Co-authored-by: Daniel Levenson <dleve123@gmail.com>

rename BUILDER to CONFIG_CLASS

f5b6e01

Merge branch 'main' into config-load

d18fda1

update config

760412d

update module

db24678

update template

66f88e9

update tests

3dd3b74

update all comparisons

a4795b8

update all measurements

04120b3

m-movahhedinia mentioned this pull request Aug 26, 2022

Cannot use f1/recall/precision arguments in CombinedEvaluations.compute #234

Open

leandro added 8 commits August 31, 2022 11:02

update metrics

68a4b0d

Merge branch 'main' into config-load

c240109

update new brier_score

6353df0

refactor toxicity

59b3670

fix style

4404a7c

update template

10d74e5

rename folder of evaluators to avoid name collisions

67a40a6

fix style

ad829f9

leandro added 6 commits August 31, 2022 19:33

fix style

e07f07e

fix test with local metrics

0166619

fix code-eval

57abd54

fix-bleurt

b687cef

fix local metric tests for good

32683de

update docs with config example

c2daa35

lvwerra marked this pull request as ready for review September 1, 2022 07:35

lhoestq approved these changes Sep 1, 2022

View reviewed changes

metrics/bleu/bleu.py Show resolved Hide resolved

src/evaluate/module.py Outdated Show resolved Hide resolved

lvwerra requested review from lewtun and sashavor September 1, 2022 10:23

Update src/evaluate/module.py

00742e0

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

sashavor approved these changes Sep 1, 2022

View reviewed changes

lewtun approved these changes Sep 2, 2022

View reviewed changes

lvwerra and others added 5 commits September 21, 2022 09:20

Apply suggestions from code review

b0d80ef

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Merge branch 'main' into config-load

e0ecc92

fix regard

89e470a

fix bert score

e2df8dd

fix regard

2216725

lvwerra merged commit e4a2724 into main Sep 22, 2022

lvwerra deleted the config-load branch September 22, 2022 13:10

lvwerra added a commit that referenced this pull request Sep 22, 2022

Revert "Refactor kwargs and configs (#188)"

6b53bb2

This reverts commit e4a2724.

lvwerra mentioned this pull request Sep 22, 2022

Revert "Refactor kwargs and configs" #299

Merged

lvwerra added a commit that referenced this pull request Sep 22, 2022

Revert "Refactor kwargs and configs" (#299)

c447fc8

Revert "Refactor kwargs and configs (#188)" This reverts commit e4a2724.

lvwerra restored the config-load branch September 22, 2022 14:31

lvwerra mentioned this pull request Oct 7, 2022

add versioning the HubEvaluationModuleFactory #314

Merged

christian-storm mentioned this pull request May 20, 2023

Merged PR #425 doesn't properly address the issue of passing metric specific kwargs to compute or load #462

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor kwargs and configs #188

Refactor kwargs and configs #188

lvwerra commented Jul 18, 2022

HuggingFaceDocBuilderDev commented Jul 18, 2022 •

edited

Loading

lewtun left a comment

lewtun Jul 18, 2022

lewtun Jul 18, 2022

lhoestq left a comment

lhoestq Jul 29, 2022

lhoestq Jul 29, 2022

lhoestq Jul 29, 2022

lhoestq Jul 29, 2022

lvwerra commented Aug 5, 2022

lhoestq left a comment

lhoestq Aug 18, 2022

lhoestq Aug 18, 2022

lvwerra Aug 19, 2022

lhoestq Aug 19, 2022

dleve123 left a comment

lhoestq left a comment

lvwerra commented Sep 1, 2022

lvwerra commented Sep 1, 2022 •

edited

Loading

lhoestq commented Sep 1, 2022

sashavor left a comment

lewtun left a comment

lewtun Sep 2, 2022

lvwerra commented Sep 22, 2022


		```python

		>>> metric = evaluate.load("accuracy", normalize=False)

Refactor kwargs and configs #188

Refactor kwargs and configs #188

Conversation

lvwerra commented Jul 18, 2022

HuggingFaceDocBuilderDev commented Jul 18, 2022 • edited Loading

lewtun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhoestq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvwerra commented Aug 5, 2022

lhoestq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dleve123 left a comment

Choose a reason for hiding this comment

lhoestq left a comment

Choose a reason for hiding this comment

lvwerra commented Sep 1, 2022

lvwerra commented Sep 1, 2022 • edited Loading

lhoestq commented Sep 1, 2022

sashavor left a comment

Choose a reason for hiding this comment

lewtun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvwerra commented Sep 22, 2022

HuggingFaceDocBuilderDev commented Jul 18, 2022 •

edited

Loading

lvwerra commented Sep 1, 2022 •

edited

Loading