Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ConfigurableValidator #1142

Merged
merged 10 commits into from
Sep 8, 2023
Merged

Conversation

wonjuleee
Copy link
Contributor

Summary

How to test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added the description of my changes into CHANGELOG.​
  • I have updated the documentation accordingly

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

@wonjuleee wonjuleee requested review from a team as code owners September 6, 2023 05:50
@wonjuleee wonjuleee requested review from vinnamkim and removed request for a team September 6, 2023 05:50
@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Patch coverage: 84.12% and project coverage change: +0.07% 🎉

Comparison is base (ec00525) 79.86% compared to head (3aa11ab) 79.93%.
Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1142      +/-   ##
===========================================
+ Coverage    79.86%   79.93%   +0.07%     
===========================================
  Files          264      265       +1     
  Lines        29391    29895     +504     
  Branches      5753     5889     +136     
===========================================
+ Hits         23473    23898     +425     
- Misses        4583     4640      +57     
- Partials      1335     1357      +22     
Flag Coverage Δ
macos-11_Python-3.8 79.05% <84.12%> (+0.09%) ⬆️
ubuntu-20.04_Python-3.8 79.92% <84.12%> (?)
windows-2019_Python-3.8 79.86% <84.12%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/datumaro/plugins/configurable_validator.py 84.12% <84.12%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.

📢 Have feedback on the report? Share it here.

pass

def _update_ann_type_stats(self, item_key: tuple, annotation: Annotation):
if self.BBOX_WARNINGS & self.warnings:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that we build a collection of update functions when initializing this class instance according to the input configuration. This is for doing not to check its configuration every time during the dataset item loop. We can iterate on the collection of update functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to register function for warning when initialization at cc10ff4. Could you check this?

Comment on lines 151 to 152
self.stats["invalid_value"] = set()
self.stats["negative_length"] = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a typed schema, e.g. @dataclass, for the output data store? It currently uses a dictionary with string key and it is hard to see what is going on this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced StatsData at cc10ff4.

Comment on lines 385 to 461
if task == TaskType.classification:
if {MultiLabelAnnotations} & self.warnings:
reports[task] += self._check_multiple_label(stats)
Copy link
Contributor

@vinnamkim vinnamkim Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we produce the reports as dict, is it possible to update each part of the dictionary in each collector's member function? I hope that we do not gather all the things getting together in this function.

@wonjuleee wonjuleee force-pushed the conf_val branch 3 times, most recently from cc10ff4 to d79f11c Compare September 7, 2023 05:13
return None

def compute_statistics(self, dataset):
if not self.init_flag:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this flag? I guess that this class has configurations when it is created. With following these configuration and taking the dataset in this function argument, it produce statistics according to those two inputs. Therefore, we should initialize statistics carriers (self.all_stats) every time we take dataset, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you meant that we have to compute statistics from scratch when dataset is passed everytime, is this correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. However, if you imagine the functionality to update incrementally all_stats by calling compute_statistics() again and again for the sequence of dataset, I think it is better to introduce all_stats as the function argument (not member variables) for the more understandable data flow in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed init_flag following your opinion.

src/datumaro/plugins/configurable_validator.py Outdated Show resolved Hide resolved
Comment on lines 949 to 950
if not stats_collector:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this if statement? I think that L892 and self._init_stats_collector() make there is no None in self.all_stats without any exception. Similar to my above comment, wouldn't it be better to let self._init_stats_collector() return all_stats rather than storing it as the member variable? The all_stats are dependent to dataset, so that maintaining it as a member variable seems not good for understanding the data flow in a function.

Copy link
Contributor

@vinnamkim vinnamkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wonjuleee wonjuleee merged commit 0cdf176 into openvinotoolkit:develop Sep 8, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants