You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When using LFAnalysis and the lf_summary method I often find myself wondering what the incorrect instances for a particular labeling function actually are. It would be useful to have a way to return all the incorrectly labeled instances for a particular LF, or optionally a sample of the incorrect instances.
Describe the solution you'd like
A new method added to LFAnalysis. This could be called lf_incorrect.
It would need to take in your data_points and corresponding Y. It would then return the instances from data_points that do not correspond to Y. Since all the other lf_ methods work for each LF, I think this could return a dictionary mapping LF names to their incorrectly labeled instances.
If large datasets with a lot of incorrect instances are a concern, I could add an optional parameter “max_instances” to return.
Additional context
This is something I would be looking to submit a PR for.
The text was updated successfully, but these errors were encountered:
Great idea! Similar functionality exists in the get_label_buckets method under snorkel/analysis/error_analysis: https://github.com/HazyResearch/snorkel/blob/e316d5700cbfd2243c0d5485537ef310fc0e7a1e/snorkel/analysis/error_analysis.py#L9. To use it, you would pass a gold labels vector and an LF labels vector, and that will return different error buckets you could pull from to get the indices of the corresponding data points where the LF was incorrect. If you wanted to submit a PR that wraps that method and has the functionality you described, that'd be great! You could likely stick it in that same error_analysis file.
Addresses #1602. Added a method to analysis/error_analysis that wraps get_label_buckets functionality. Given a bucket, a NumPy array x of your data, and corresponding y label(s), it will return to you x with only the instances corresponding to that bucket.
Is your feature request related to a problem? Please describe.
When using
LFAnalysis
and thelf_summary
method I often find myself wondering what the incorrect instances for a particular labeling function actually are. It would be useful to have a way to return all the incorrectly labeled instances for a particular LF, or optionally a sample of the incorrect instances.Describe the solution you'd like
A new method added to
LFAnalysis
. This could be calledlf_incorrect
.It would need to take in your data_points and corresponding Y. It would then return the instances from data_points that do not correspond to Y. Since all the other lf_ methods work for each LF, I think this could return a dictionary mapping LF names to their incorrectly labeled instances.
If large datasets with a lot of incorrect instances are a concern, I could add an optional parameter “max_instances” to return.
Additional context
This is something I would be looking to submit a PR for.
The text was updated successfully, but these errors were encountered: