-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add combine
to compose multiple evaluations
#150
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Tbh I'm not sold on the name - won't people get confused about this being a different kind of "ensemble"? I'd call it |
I suggested Btw I think EvaluationModule is not a good name - I'd like to discuss this more (I missed this suggestion in #40, sorry, and I did not approve #42). In particular for me it feels unnatural for me to understand what it's supposed to represent, especially when implementing a metric. Maybe Metric can be a subclass of EvaluationModule, if you really need the high level abstraction for |
Yes, I am also not totally happy with the name, yet. The reason I didn't go for Regarding |
Ok I see ! I'm fine with having
The idea is to make it easy for anyone to read a metric/comparison/measurement script. I think it's easier to understand when we use more relevant names. Though it also depends if it is compatible your plans regarding the abstractions/separations of those types of evaluations |
Following the discussions above I added a The naming or Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@douwekiela picked up on all of the typos, otherwise the logic looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ! More comments that mostly come from the Open Source experience I got from the past two years. I'm happy to share these thoughts but please ignore if they slow you down too much
duplicate_names = [ | ||
item for item, count in collections.Counter(self.evaluation_module_names).items() if count > 1 | ||
] | ||
duplicate_counter = {name: 0 for name in duplicate_names} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also possible to simply raise an error for now, if we need more time to think about how to handle this case.
As a user it's a bit unclear for me that it would automatically use prefixes when there are name conflicts, but no prefixes if there are no conflicts. This is the kind of black magic we try to avoid usually - though I don't have a strong opinion on this very specific case, since it would be easy to explain in the docs. What do you think ?
an alternative could be to ask users to name the metrics:
combine({
"metric_1": metric_1,
"metric_2": metric_2
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding a warning when a prefixes are added:
f"The key '{key}' occurs in several evaluation results. Prefixing with the key with '{prefix}'."
The user can already use a dict
in the current state of the PR (see last example in initial comment). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I noticed that even if you provide a dictionary, it may not use a prefix. Therefore the logic is always "use a prefix if there is a name conflict", even with a dictionary. Because of that I don't think a warning is necessary, since a warning would mean that a behavior does not really fit the minimal expected behavior.
This requires to be written explicitly in the docstring though in this case
EvaluationEnsemble
combine
to compose multiple evaluations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ! It looks all good :)
The described behavior is really nice when there are name conflicts actually, I just added a comment to describe it in the docstring
duplicate_names = [ | ||
item for item, count in collections.Counter(self.evaluation_module_names).items() if count > 1 | ||
] | ||
duplicate_counter = {name: 0 for name in duplicate_names} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I noticed that even if you provide a dictionary, it may not use a prefix. Therefore the logic is always "use a prefix if there is a name conflict", even with a dictionary. Because of that I don't think a warning is necessary, since a warning would mean that a behavior does not really fit the minimal expected behavior.
This requires to be written explicitly in the docstring though in this case
expected_results[f"{dummy_metric.name}_{i}_{k}"] = dummy_result[k] | ||
self.assertDictEqual(expected_results, combined_evaluation.compute(predictions=preds, references=refs)) | ||
|
||
def test_two_modules_with_same_score_name(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also test when they have the same module name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is covered in test_duplicate_module
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merge this now, if that's not the case I can add more tests in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
This PR adds the
EvaluationEnsemble
class that lets users combine severalEvaluationModule
s into a single, compact object that behaves like a single module. Here's how it works:Under the hood the
add
,add_batch
, andcompute
function iteratively call the methods with the same name for each module.There are a few scenarios for the returned dict:
force_prefix=True
at initializationHere an example to illustrate forced prefixing with custom names:
This should be especially useful if the same metric with different configs is used in one ensemble.
I am not married to the name
EvaluationEnsemble
so if there are any better suggestions I am happy to hear them.