-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply clean transform of updated annotations only for tabular annotation type #1533
Apply clean transform of updated annotations only for tabular annotation type #1533
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## releases/1.8.0 #1533 +/- ##
=================================================
Coverage ? 81.05%
=================================================
Files ? 276
Lines ? 32373
Branches ? 6580
=================================================
Hits ? 26239
Misses ? 4687
Partials ? 1447
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
<!-- Contributing guide: https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary CVS-143460 <!-- Resolves openvinotoolkit#111 and openvinotoolkit#222. Depends on openvinotoolkit#1000 (for series of dependent commits). This PR introduces this capability to make the project better in this and that. - Added this feature - Removed that feature - Fixed the problem openvinotoolkit#1234 --> ### How to test <!-- Describe the testing procedure for reviewers, if changes are not fully covered by unit tests or manual testing can be complicated. --> ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [x] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [ ] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [x] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2024 Intel Corporation # # SPDX-License-Identifier: MIT ``` --------- Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>
9ccf4a2
to
8d89e59
Compare
for key in refined_media.data.keys() | ||
if ann.caption.startswith(key) | ||
] | ||
ann = ann.wrap(caption=value[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed that value
has always one element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we get value through the above lines, caption would have only one value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value
comes from this for-loop for key in refined_media.data.keys()
. How did you guarantee it is singular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, the expectation is to do AstypeAnnotations
on a tabular dataset before the Clean
transformation. So if the annotation type is Caption
, then it should have only one of them. We've refined the tabular annotation, so if "Title" is "It's really good!", we've transformed it to "Title:It's really good! We have not yet covered other normal image datasets that are not refined tabular datasets. So if the annotation type is caption, it should be singular. To cover other normal image datasets, I will come back to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems an implicit condition in Datumaro. What I want is to make everything explicit if possible. Relying on the implicit condition can lead to a bug. If someone develops something that can changes the implicit condition, he must never be noticed what he is doing can affect to the behavior of this code. It leads to a bad result in software maintenance. Add something like #1533 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I added https://github.com/sooahleex/datumaro/blob/f6d5c3902d59e7f2d84ad8ef74edf3dfad2c9da6/src/datumaro/plugins/transforms.py#L1979-L1982. If this is not enough to show the implicit condition, please let me know. I will consider it.
for ann in item.annotations: | ||
if isinstance(ann, Tabular): | ||
annotation_values = { | ||
key: refined_media.data[key] for key in item.annotations[0].values.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason that we do not care item.annotations[i].values
for i=1,...
not just i=0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a dataset item contains a tabular annotation type, it will contain only one annotation of tabular for each item. This item will not contain any other annotations. I think this part is a bit hard-coded, so if there is another way to check the annotation type, please let me know.
src/datumaro/plugins/transforms.py
Outdated
if isinstance(ann, Tabular): | ||
annotation_values = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to put in a kind of checker here such as
if isinstance(ann, Tabular): | |
annotation_values = { | |
if isinstance(ann, Tabular): | |
if len(item.annotations) != 1: | |
msg = ... | |
raise ValueError(msg) | |
annotation_values = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this through afd56c5. Thank you for your suggestion.
Summary
Tabular
andCaption
annotation type duringClean
transformClean
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.