You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have seen that in this PR #158
you have removed the recall and precision from the ROUGE score calculation, which now only returns F1 score.
May I ask why was this decision made, and why there doesn't seem to be an option to keep recall and precision in the returned output?
This is also a breaking change (in the sense that if I have some code written for evaluate==0.1.2 it will no longer work in evaluate==0.2.2)
Shouldn't a backward incompatible change require a major version bump according to https://semver.org ?
Thanks for the clarification
The text was updated successfully, but these errors were encountered:
Yes, this was a breaking change - we planned to do it before the initial release but it went under. There are a number of advantages moving from the RougeScore object that was returned to a pure python dict. If you find the recall and precision is useful we could add an option (e.g. detailed=True) to the compute call to return those as well.
We haven't had a full major release yet, so there might be some breaking changes here and there, but there are none planned for the core of metrics and we really want to avoid it.
I definitely agree a pure python dictionary is much better, however I believe it would be possible to add recall and precision in a python dict without necessarily using the old RougeScore object.
Overall many summarization papers seem to report ROUGE scores based on F1, but some also use scores such as recall (for example for content selection) therefore I believe for researchers it would be valuable to have an option to see recall and precision (maybe in a pure python dict).
I'm happy to use the older version now that I've realised the issue, but perhaps if there is more demand for this detailed=True feature then it would be worth considering for the future.
I was wondering if this thread was taken into consideration? Because with evaluate, the Rouge score still only reports one score and not the precision/recall/Fscore.
Hello
I have seen that in this PR #158
you have removed the recall and precision from the ROUGE score calculation, which now only returns F1 score.
May I ask why was this decision made, and why there doesn't seem to be an option to keep recall and precision in the returned output?
This is also a breaking change (in the sense that if I have some code written for
evaluate==0.1.2
it will no longer work inevaluate==0.2.2
)Shouldn't a backward incompatible change require a major version bump according to https://semver.org ?
Thanks for the clarification
The text was updated successfully, but these errors were encountered: