Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix add method #424

Merged
merged 2 commits into from
Mar 14, 2023
Merged

fix add method #424

merged 2 commits into from
Mar 14, 2023

Conversation

hazrulakmal
Copy link
Contributor

@hazrulakmal hazrulakmal commented Feb 18, 2023

Fixes #355

  1. Rouge's README.md typo correction
  2. changes in self.info.features

This is my first attempt to debug a public codebase. def need a few pointers for guidance.

I've investigated two possible sources of error methods as per suggested by you @lvwerra, namely the _infer_feature_from_example or _enforce_nested_string_type.

After some debugging, I found that the former method may work as intended. It correctly picks the right schema format for a given prediction and reference input. For example, when I ran the code below,

rouge = evaluate.load("rouge")
rouge.add(predictions="sentence 1", references="sentence 1")
print(rouge.compute())

there are two schemas available for rouge (as it supports multiple reference inputs for a single prediction) which are saved as a list like follows [{'predictions': Value(dtype='string', id='sequence'), 'references': Sequence(feature=Value(dtype='string', id='sequence'), length=-1, id=None)}, {'predictions': Value(dtype='string', id='sequence'), 'references': Value(dtype='string', id='sequence')}]. the method will then update the right format to self.selected_feature_format.

I suspect the problem arises when the code tries to call _enforce_nested_string_type on self.info.features which stores both schemas rather than the one that was chosen by _infer_feature_from_example. this discrepancy then leads to _enforce_nested_string_type enforcing on the wrong schema which in this case is the reference sequence schema when it was supposed to enforce it on the reference value schema.

So I changed self.info.features to self.selected_feature_format and it works. I tried on a few other tests that I created in debug_test.py and they all passed - this probably isn't the right way of doing code tests but I'm new so yeah 😅.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 18, 2023

The documentation is not available anymore as the PR was closed or merged.

@hazrulakmal
Copy link
Contributor Author

hazrulakmal commented Feb 27, 2023

I've tested the code using $ python -m pytest ./tests/ and it passes all existing tests except for

================================================================================= short test summary info ==================================================================================
FAILED tests/test_evaluator.py::TestTextClassificationEvaluator::test_bootstrap - AssertionError: 0.3355737770310757 != 0.33333 within 5 places (0.0022437770310757 difference)
FAILED tests/test_evaluator.py::TestTextClassificationEvaluator::test_bootstrap_and_perf - AssertionError: 0.3355737770310757 != 0.333333 within 5 places (0.0022407770310757247 difference)
=========================================================== 2 failed, 149 passed, 82 skipped, 9 warnings in 396.56s (0:06:36) ==============================================================

which I suspect failure was due to the bootstrap randomness rather than the underlying code. The code without the fix also failed to pass these two tests.

On top of that, I also run a few other tests myself like below

rouge = evaluate.load("accuracy")
rouge.add(predictions=1, references=1)
print("Accuracy metrics")
print(rouge.compute())

rouge = evaluate.load("rouge")
print(rouge.features)
predictions = ["hello there", "general kenobi"]
references = [["hello", "there"], ["general kenobi", "general yoda"]]
for sample_pred, sample_ref in zip(predictions, references):
    rouge.add(predictions=sample_pred, references=sample_ref)
print("ROUGE metrics")
print(rouge.compute())

the output I receive is exactly as what we would expect it to see as shown in the rouge doc

Also, when I run

 rouge = evaluate.load('rouge')
 rouge.add(predictions="sentence 1", references="sentence 2")

like the person who reported this bug, I received an output {'rouge1': 0.5, 'rouge2': 0.0, 'rougeL': 0.5, 'rougeLsum': 0.5}

@hazrulakmal hazrulakmal marked this pull request as ready for review February 27, 2023 13:12
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for fixing!

@lvwerra lvwerra merged commit 926d1fe into huggingface:main Mar 14, 2023
unna97 pushed a commit to unna97/evaluate that referenced this pull request Mar 20, 2023
* fix self.info.features

* remove debug_test.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem with add method and Rouge
3 participants