-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ConfigurableValidator #1142
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #1142 +/- ##
===========================================
+ Coverage 79.86% 79.93% +0.07%
===========================================
Files 264 265 +1
Lines 29391 29895 +504
Branches 5753 5889 +136
===========================================
+ Hits 23473 23898 +425
- Misses 4583 4640 +57
- Partials 1335 1357 +22
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here. |
pass | ||
|
||
def _update_ann_type_stats(self, item_key: tuple, annotation: Annotation): | ||
if self.BBOX_WARNINGS & self.warnings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope that we build a collection of update functions when initializing this class instance according to the input configuration. This is for doing not to check its configuration every time during the dataset item loop. We can iterate on the collection of update functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to register function for warning when initialization at cc10ff4. Could you check this?
self.stats["invalid_value"] = set() | ||
self.stats["negative_length"] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a typed schema, e.g. @dataclass
, for the output data store? It currently uses a dictionary with string key and it is hard to see what is going on this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced StatsData at cc10ff4.
if task == TaskType.classification: | ||
if {MultiLabelAnnotations} & self.warnings: | ||
reports[task] += self._check_multiple_label(stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we produce the reports as dict
, is it possible to update each part of the dictionary in each collector's member function? I hope that we do not gather all the things getting together in this function.
cc10ff4
to
d79f11c
Compare
return None | ||
|
||
def compute_statistics(self, dataset): | ||
if not self.init_flag: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this flag? I guess that this class has configurations when it is created. With following these configuration and taking the dataset
in this function argument, it produce statistics according to those two inputs. Therefore, we should initialize statistics carriers (self.all_stats) every time we take dataset
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you meant that we have to compute statistics from scratch when dataset is passed everytime, is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. However, if you imagine the functionality to update incrementally all_stats
by calling compute_statistics()
again and again for the sequence of dataset
, I think it is better to introduce all_stats
as the function argument (not member variables) for the more understandable data flow in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed init_flag following your opinion.
if not stats_collector: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this if statement? I think that L892 and self._init_stats_collector()
make there is no None
in self.all_stats
without any exception. Similar to my above comment, wouldn't it be better to let self._init_stats_collector()
return all_stats
rather than storing it as the member variable? The all_stats
are dependent to dataset
, so that maintaining it as a member variable seems not good for understanding the data flow in a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Summary
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.