-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test all metrics against sklearn (with many input trials) #3230
Comments
I don't think a seed is required here. |
I would agree here that all metric shall be deterministic so even using different seeds increase the coverage or cases and so far you pas the same values to the two functions it shall always give the same result, right? |
Agreed, this is one good point. |
@awaelchli agree that we need such test. We already do that for many metrics: |
Hi! I am not a contributor, but I am a user of a library called hypothesis, which may be more suitable for this specific case. This library allows the user to write parameterized tests and then chooses the cases that are most likely to make the program fail, that is, the library is really robust to edge cases and can really help to find those that can be problematic to the implementation. |
hi that you for your recommendation, just not sure if I follow your recommendation, mind write a bit more how you would use https://hypothesis.works in PL... |
The idea would be to generate a hypothesis test, to test Sklearn metrics against the PL metrics, and let the library test the corner cases (as well as the "common" ones) and that way assert that both implementations are concordant, without the need to design the cases by hand, nor search for complicated patterns. |
@CamiVasz that sounds cool, mind draft a small example of how to use it, and eventually we can extend in on more PL cases... 🐰 |
https://colab.research.google.com/drive/1Dprqr1nbtgCFwsyUyb6UbXe9FE7X73Q5?usp=sharing |
@CamiVasz looks cool, but could you tell me the difference between hypothesis and just creating two random tensors myself (using |
Hypothesis generation is biased towards edge cases, maximizing the probability of failure. When you generate random numbers, these edge cases that you want to find have the same probability of appearing that easy cases. |
Just found that pytorch is also using hypothesis |
shall we open a new issue just for hypothesis testing? :] |
Yeah, maybe @CamiVasz can do that and I think it would also be great to see an example of it using one of our tests, to show the motivation. |
it would be great to have it as HackOctober issue :] |
It would be great to work on that! Is this still on board? |
@justusschock @SkafteNicki @ananyahjha93 @teddykoker Do you guys need help with testing the new metrics? @CamiVasz wants to help. |
Yeah sure. I think the whole functional API would be a good place to start. And we could then later extend it to the revamped class interface |
Hand-chosen values are not enough, we need to test with a large batch of inputs where possible.
Something in this style, maybe with a fixed seed:
The text was updated successfully, but these errors were encountered: