Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rouge score not backward compatible as recall and precision are no longer returned #260

Closed
AndreaSottana opened this issue Aug 18, 2022 · 3 comments

Comments

@AndreaSottana
Copy link

AndreaSottana commented Aug 18, 2022

Hello

I have seen that in this PR #158
you have removed the recall and precision from the ROUGE score calculation, which now only returns F1 score.
May I ask why was this decision made, and why there doesn't seem to be an option to keep recall and precision in the returned output?

This is also a breaking change (in the sense that if I have some code written for evaluate==0.1.2 it will no longer work in evaluate==0.2.2)
Shouldn't a backward incompatible change require a major version bump according to https://semver.org ?

Thanks for the clarification

@lvwerra
Copy link
Member

lvwerra commented Aug 18, 2022

Hi @AndreaSottana

Yes, this was a breaking change - we planned to do it before the initial release but it went under. There are a number of advantages moving from the RougeScore object that was returned to a pure python dict. If you find the recall and precision is useful we could add an option (e.g. detailed=True) to the compute call to return those as well.

We haven't had a full major release yet, so there might be some breaking changes here and there, but there are none planned for the core of metrics and we really want to avoid it.

Sorry for the inconvenience!

@AndreaSottana
Copy link
Author

Thanks @lvwerra for your quick reply.

I definitely agree a pure python dictionary is much better, however I believe it would be possible to add recall and precision in a python dict without necessarily using the old RougeScore object.
Overall many summarization papers seem to report ROUGE scores based on F1, but some also use scores such as recall (for example for content selection) therefore I believe for researchers it would be valuable to have an option to see recall and precision (maybe in a pure python dict).
I'm happy to use the older version now that I've realised the issue, but perhaps if there is more demand for this detailed=True feature then it would be worth considering for the future.

Thanks again

@lvwerra lvwerra closed this as completed Dec 6, 2022
@hanane-djeddal
Copy link

I was wondering if this thread was taken into consideration? Because with evaluate, the Rouge score still only reports one score and not the precision/recall/Fscore.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants