-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QuantileDifferenceReason
and StandardDeviationReason
#28
Comments
Do you have an example (preferably something semi-real life) that demonstrates the utility of this technique? |
|
@FBruzzesi ah yeah that makes it more clear. On the studentized residuals ... I think a bell-curve assumption for error might work for some instances, but not all. I'm wondering if the makes sense to introduce a Regarding |
I believe those two Reason(s) you are proposing should cover the majority of cases. |
QuantileDifferenceReason
and StandardDeviationReason
I've changed the title of this issue to reflect this. I'm not sure when I'll have time to work on this feature though. Part of me is also wondering if we should first find a representative dataset such that we might have a valid demo for these tools. Any suggestions for a dataset are very welcome. |
I can work on it and try to find a toy dataset where it applies |
Grand! Let me know if you appreciate any support/review. My advice might be to first try to run the problem on the dataset before worrying too much about implementation. It's much easier to tackle the theoretical part of a problem when there's a practical example done. |
Hey @koaning, I have few questions/observations:
Please let me know if something isn't clear, I may comment here with some code snippet as well if needed. |
I think
I was thinking that we sort the residuals and allow the user to say something like "assign doubt to all rows where the error is larger than the 95% quantile".
I usually resort to
Did these yield the wrong labels? One thing you might want to try is to flip a few labels randomly upfront and to see if you can retrieve the flipped labels with this trick. It's not a perfect proxy, but it's a plausible demo. |
This looks very deterministic, meaning that for any given model you will doubt 5% of the results. On the other hand using the usual boxplot ranges mentioned above (or any other user favourite quantiles-multipliers) may or may not result in doubt. Imagine having a error 0 centered and "very" symmetrical, then the former would still doubt some results, while the latter wouldn't.
As this is a regression task I am not even sure what flipping labels exactly means. I am trying to add/multiply the feature matrix by random noise, then check if the rows I get back by |
This reminds me, we may want to have a utility submodule to make these kinds of experiments easy. |
While working on such test, I find that
is kind of misleading, as predicted values are not influenced by the shuffle, however, by random chance, few shuffled Focusing solely on those datapoints satisfying both the following conditions:
Then testing on diabets toy dataset from sklearn with 1000 different random states yields:
|
Cool! Just to confirm, could you varify the precision/recall values? Also, when are you training your model, before or after the shuffling? If we're to match reality, we should train the model after we've shuffled. |
Here are some of the stats:
Yes shuffle and training is done is such order. (*) Any better name for this one? Should we keep all these three reasons? |
One final question before we move on (although the results themselves are pretty interesting!). Could you check if these numbers change much if you flip more/less labels? I might imagine that 1%, 5%, 10% label errors might yield different results. |
The following results are mean scores across 500 different random states per reason-%shuffled pairs
|
Nicely done! It's interesting to see that the As far as I'm concerned a PR for If you happen to have any benchmarking code to share I might consider saving that for the documentation as well. |
@koaning I just found an error in the Regarding some sample code, not sure where I should/could share it. |
@FBruzzesi if it's a notebook you can put it in a Github gist if that's easier for you. |
Issue #28, StandardizedErrorReason class
I've just merged #29. Before making a new release though I'm wondering if it makes sense to add the |
Actually ... the new method is listed on the readme so I should release a patch. Lemme do that real quick. |
Done! I'll also make an announcement tomorrow for it. Got a twitter handle? If so I can give you a shoutout. |
I feel like you are not actually conviced by these other methods! I will make a notebook illustrating them as soon as I have the time and maybe we can discuss whether to add them afterword. |
Also, you should be able to find me on twitter as @BruzzesiFr |
Just to be explicit; I very much appreciate the work you're doing here! But what method are you referring to now? The I figured moving on to the Am looking forward to your notebook 👍 |
Interesting! I've added utility methods to the main branch that allows folks to play around with "flipping" labels in a subset. I'll likely also add some plotting functionality around it so we can get some "precision_at_k" and "recall_at_k" plots to compare approaches. My impression so far is that for some dataset/model/reason combinations it's very easy to find bad labels while for others it's barely better than random sorting. |
I'll likely merge the plotting tonight and I'll also push a new version. Out of curiosity, since you've given the library a spin already, are there any features missing in your opinion with regards to plotting? |
As you may have noticed I work much more with regression problems than classification tasks. There are a lot of custom plotting I do when it comes to check results/predictions, and currently working on a (still private) library to standardize few of these checks. That said, not sure of what you could integrate here, maybe something as simple as residual plot with different colors for doubted/non-doubted points, similar to what I tried to do in the notebook I just shared. Feel free to assign me such task if needed. |
@koaning should we proceed to close this issue? |
Hey! I was thinking if it would make sense to add two more reasons for regressions tasks, namely something like
HighLeveragePointReason
andHighStudentizedResidualReason
.Citing Wikipedia:
The text was updated successfully, but these errors were encountered: