Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply clean transform of updated annotations only for tabular annotation type #1533

Conversation

sooahleex
Copy link
Contributor

@sooahleex sooahleex commented Jun 17, 2024

Summary

  • Fix to update annotation for Tabular and Caption annotation type during Clean transform
  • Add comment for Clean
  • Fix typo of function name

How to test

  • Update unit test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added the description of my changes into CHANGELOG.​
  • I have updated the documentation accordingly

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2024 Intel Corporation
#
# SPDX-License-Identifier: MIT

@sooahleex sooahleex marked this pull request as ready for review June 17, 2024 07:46
@sooahleex sooahleex requested review from a team as code owners June 17, 2024 07:46
@sooahleex sooahleex requested review from vinnamkim and removed request for a team June 17, 2024 07:46
Copy link

codecov bot commented Jun 17, 2024

Codecov Report

Attention: Patch coverage is 73.68421% with 5 lines in your changes missing coverage. Please review.

Please upload report for BASE (releases/1.8.0@35e87c2). Learn more about missing BASE report.

Files Patch % Lines
src/datumaro/components/dataset.py 25.00% 2 Missing and 1 partial ⚠️
src/datumaro/plugins/transforms.py 86.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@                Coverage Diff                @@
##             releases/1.8.0    #1533   +/-   ##
=================================================
  Coverage                  ?   81.05%           
=================================================
  Files                     ?      276           
  Lines                     ?    32373           
  Branches                  ?     6580           
=================================================
  Hits                      ?    26239           
  Misses                    ?     4687           
  Partials                  ?     1447           
Flag Coverage Δ
ubuntu-20.04_Python-3.10 81.04% <73.68%> (?)
windows-2022_Python-3.10 81.02% <73.68%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sooahleex sooahleex added this to the 1.8.0 milestone Jun 21, 2024
@sooahleex sooahleex changed the base branch from develop to releases/1.8.0 June 21, 2024 09:07
itrushkin and others added 6 commits June 22, 2024 06:50
<!-- Contributing guide:
https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md
-->

### Summary
CVS-143460
<!--
Resolves openvinotoolkit#111 and openvinotoolkit#222.
Depends on openvinotoolkit#1000 (for series of dependent commits).

This PR introduces this capability to make the project better in this
and that.

- Added this feature
- Removed that feature
- Fixed the problem openvinotoolkit#1234
-->

### How to test
<!-- Describe the testing procedure for reviewers, if changes are
not fully covered by unit tests or manual testing can be complicated.
-->

### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [x] I have added unit tests to cover my changes.​
- [ ] I have added integration tests to cover my changes.​
- [ ] I have added the description of my changes into
[CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md).​
- [ ] I have updated the
[documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs)
accordingly

### License

- [x] I submit _my code changes_ under the same [MIT
License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE)
that covers the project.
  Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example
below).

```python
# Copyright (C) 2024 Intel Corporation
#
# SPDX-License-Identifier: MIT
```

---------

Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>
@sooahleex sooahleex force-pushed the bugfix/clean_transform_for_tabular_ann branch from 9ccf4a2 to 8d89e59 Compare June 24, 2024 07:59
for key in refined_media.data.keys()
if ann.caption.startswith(key)
]
ann = ann.wrap(caption=value[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that value has always one element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we get value through the above lines, caption would have only one value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value comes from this for-loop for key in refined_media.data.keys(). How did you guarantee it is singular?

Copy link
Contributor Author

@sooahleex sooahleex Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, the expectation is to do AstypeAnnotations on a tabular dataset before the Clean transformation. So if the annotation type is Caption, then it should have only one of them. We've refined the tabular annotation, so if "Title" is "It's really good!", we've transformed it to "Title:It's really good! We have not yet covered other normal image datasets that are not refined tabular datasets. So if the annotation type is caption, it should be singular. To cover other normal image datasets, I will come back to this.

Copy link
Contributor

@vinnamkim vinnamkim Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems an implicit condition in Datumaro. What I want is to make everything explicit if possible. Relying on the implicit condition can lead to a bug. If someone develops something that can changes the implicit condition, he must never be noticed what he is doing can affect to the behavior of this code. It leads to a bad result in software maintenance. Add something like #1533 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I added https://github.com/sooahleex/datumaro/blob/f6d5c3902d59e7f2d84ad8ef74edf3dfad2c9da6/src/datumaro/plugins/transforms.py#L1979-L1982. If this is not enough to show the implicit condition, please let me know. I will consider it.

for ann in item.annotations:
if isinstance(ann, Tabular):
annotation_values = {
key: refined_media.data[key] for key in item.annotations[0].values.keys()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason that we do not care item.annotations[i].values for i=1,... not just i=0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a dataset item contains a tabular annotation type, it will contain only one annotation of tabular for each item. This item will not contain any other annotations. I think this part is a bit hard-coded, so if there is another way to check the annotation type, please let me know.

Comment on lines 1988 to 1989
if isinstance(ann, Tabular):
annotation_values = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to put in a kind of checker here such as

Suggested change
if isinstance(ann, Tabular):
annotation_values = {
if isinstance(ann, Tabular):
if len(item.annotations) != 1:
msg = ...
raise ValueError(msg)
annotation_values = {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this through afd56c5. Thank you for your suggestion.

@sooahleex sooahleex merged commit 6a50cdb into openvinotoolkit:releases/1.8.0 Jun 25, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants