Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a test to check that Evaluator evaluations match transformers examples #163

Merged
merged 18 commits into from
Jul 4, 2022

Conversation

fxmarty
Copy link
Contributor

@fxmarty fxmarty commented Jun 27, 2022

Basically just to check that Trainer and Evaluator do return the same evaluations. This was useful for me working on Optimum to debug slight differences between evaluations with pipelines vs. Trainer (https://github.com/fxmarty/optimum/tree/tests-benchmark/tests/benchmark/onnxruntime), where depending on the task you need to be careful with the kwargs passed to the pipeline to match the output.

Let me know if you think such tests for each tasks are useful or not. If not, I will just add TokenClassificationEvaluator, QuestionAnsweringEvaluator and ImageProcessingEvaluator without the tests.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 27, 2022

The documentation is not available anymore as the PR was closed or merged.

@lvwerra
Copy link
Member

lvwerra commented Jun 27, 2022

Hi @fxmarty thanks for working on this! That does indeed look useful, kind of a compatibility test between Trainer and evaluator. I am curious to hear what @LysandreJik thinks about this and how this is handled in other libraries (e.g. Trainer vs. accelerate)? The evaluate library is generally not tightly coupled to transformers so I am not sure how much we should test against it.

@lvwerra
Copy link
Member

lvwerra commented Jun 30, 2022

I have been thinking about this some more: I think although this is a great tool for debugging the evaluator at the point the PR is merged we know the expected value and can simply test against it, no? There is no reason to run the Trainer to get the value to compare against. What do you think?

@fxmarty
Copy link
Contributor Author

fxmarty commented Jul 1, 2022

Agreed it's probably a bit overkill, although checking against PyTorch examples makes sure that in the future we continue to match Trainer / PyTorch scripts examples behavior.

If you would prefer to have a fixed value in the test, let me know and I will edit accordingly!

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @LysandreJik I think we can add it. It will be an interesting signal if this fails. Thanks a lot for adding this! I left a few minor comments, let me know if you have any questions.

I don't think we need subfolders at this point and you can just create a new file in tests (e.g. test_trainer_evaluator_parity.py).

tests/evaluator/test_text_classification.py Outdated Show resolved Hide resolved
tests/evaluator/test_text_classification.py Outdated Show resolved Hide resolved
tests/evaluator/test_text_classification.py Outdated Show resolved Hide resolved
tests/evaluator/test_text_classification.py Outdated Show resolved Hide resolved
tests/evaluator/test_text_classification.py Outdated Show resolved Hide resolved
tests/evaluator/test_text_classification.py Outdated Show resolved Hide resolved
.github/workflows/test_evaluator.yml Outdated Show resolved Hide resolved
@fxmarty
Copy link
Contributor Author

fxmarty commented Jul 1, 2022

Thank you for the review @lvwerra , I did changes according to your comments! I think CircleCI is not running here so if you can trigger it it's nice.

Once this one is merged and we have agreed a solution in #167 I will edit the test for token-classification accordingly.

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@lvwerra lvwerra merged commit 4532f79 into huggingface:main Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants