-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor kwargs and configs #188
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this feature @lvwerra - the API design looks great to me and I think it will make the evaluation UX much better!
Since the feature is backwards compatible, I don't see any problem with the current proposal - happy to review again once the PR is ready for another pass (I've just left minor comments)
src/evaluate/info.py
Outdated
class Config: | ||
"""Base class to store the configuration used for the evaluation module.""" | ||
|
||
name = "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment (or docstring) that explains what name
and allowed_names
are and how they're related?
@@ -54,6 +77,7 @@ class EvaluationModuleInfo: | |||
streamable: bool = False | |||
format: Optional[str] = None | |||
module_type: str = "metric" # deprecate this in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but one suggestion would be to add a deprecation warning so we can alert users when this will be removed / remind ourselves to remove it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ! I added a few comments about the design, to try to make it more intuitive and practical
metrics/f1/f1.py
Outdated
class F1Config(Config): | ||
|
||
config_name: str = "default" | ||
allowed_config_names: List[str] = field(default_factory=lambda: ["default", "multilabel"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this a class attribute ? And all in caps to make it clear that it's not a parameter ?
This can be moved to a class attribute of Metric instead btw
metrics/f1/f1.py
Outdated
@dataclass | ||
class F1Config(Config): | ||
|
||
config_name: str = "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You call it config_name
here but in Config it is called name
src/evaluate/info.py
Outdated
@@ -54,6 +77,7 @@ class EvaluationModuleInfo: | |||
streamable: bool = False | |||
format: Optional[str] = None | |||
module_type: str = "metric" # deprecate this in the future | |||
config: Optional[Config] = Config() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set it to None by default IMO
metrics/f1/f1.py
Outdated
@@ -114,11 +134,17 @@ def _info(self): | |||
"references": datasets.Value("int32"), | |||
} | |||
), | |||
config=F1Config(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird to instantiate the default one here. Also, what if the features depend on the config, how would we access the config params from here ? Maybe _info() can take the config as input instead
And you can add the class attribute BUILDER_CLASS to instantiate the config before passing it to _info
Thanks for your feedback! @lhoestq I reworked the logic based on your feedback. Is that what you had in mind? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice ! Love it this way :) more comments
metrics/f1/f1.py
Outdated
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION) | ||
class F1(evaluate.Metric): | ||
def _info(self): | ||
|
||
BUILDER_CLASS = F1Config() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I meant CONFIG_CLASS. And it doesn't have to be instantiated.
BUILDER_CLASS = F1Config() | |
CONFIG_CLASS = F1Config |
This way you don't carry the same config for all instances in self.BUILDER_CLASS
. And instead of
self.BUILDER_CLASS.update(kwargs)
info = self._info(self.BUILDER_CLASS)
you can do
info = self._info(self.CONFIG_CLASS(**kwargs))
@@ -436,7 +452,12 @@ def compute(self, *, predictions=None, references=None, **kwargs) -> Optional[di | |||
|
|||
inputs = {input_name: self.data[input_name] for input_name in self._feature_names()} | |||
with temp_seed(self.seed): | |||
output = self._compute(**inputs, **compute_kwargs) | |||
config_state = deepcopy(self.config) | |||
self.config.update(compute_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a temporary assignment here ? Otherwise calling compute
twice, first with kwargs and then without, would apply the kwargs to the second call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why after the call the config is reverted: self._module_info.config = config_state
in L460. Or do you see a flaw in that logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an error during _compute
, then it's not going back to normal, you can use try:... finally:...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just looking for this functionality in evaluate and stumbled upon this PR. Looks awesome, left one comment that I believe enhances the type hints. Thanks for the awesome library 👍
Co-authored-by: Daniel Levenson <dleve123@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome ! LGTM :)
Do you know how users can get some docstrings about the config parameters ?
This would be useful to document IMO (can be done in a subsequent PR)
The can get the configs with: metric = evaluate.load("some_metric")
print(metric.config) Is that what you had in mind? Or do you want a method that extends the modules docstring automatically with that information? |
@sashavor I also refactored I suggest merging this PR when we also have some time to make PRs to the community metrics as this is a breaking change to their modules if they install from |
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
It's good this way ! Maybe this can be mentioned in the docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super cool! ⭐
Makes things a lot clearer 👓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great piece of refactoring - nice!
I left some nits on the quick tour and went through the source changes - overall it LGTM. One feature request would be to have a get_metric_config_names()
function that is similar to the one used in datasets
: https://huggingface.co/docs/datasets/v2.4.0/en/package_reference/loading_methods#datasets.get_dataset_config_names
This is handy when you programatically want to get all of the configs associated with a metric
|
||
```python | ||
|
||
>>> metric = evaluate.load("accuracy", normalize=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the convention is for the evaluate
docs, but it's quite handy if all the code snippets "just work", so I suggest including import evaluate
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
After a few hours of my life dedicated to finding the reason the tests suddenly fail I figured out the issue: Transformers made a release so the new example scripts associated with the new version use Not sure how to easily fix these tests and it's probably a very niche use-case, so I'll merge into main which should automatically fix the issue. If not I'll revert and think a bit more about this. |
This PR reworks the config/kwargs logic for evaluation modules (closes #169). The structure is the following:
Config
(adataclass
) inside the evaluation script.load
.config_name
is also part of theConfig
and is in addition tested against theallowed_names
. This field could also be used to get all allowed config names (Implementget_evaluation_module_config_names()
function #138) plus the additional settings.compute
which updates them for the duration of the call. This makes this change backward compatible but adds more ways how to changes configs. From a user perspective it might be easier to just have one way to set configs. What do you think?To illustrate how this would work I updated the F1-score metric. Let me know what you think @lewtun and @lhoestq!