Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mark datumaro library version when exporting as datumaro #842

Merged

Conversation

bonhunko
Copy link
Contributor

@bonhunko bonhunko commented Mar 8, 2023

Summary

Resolves https://jira.devtools.intel.com/browse/CVS-105523

This PR adds the datumaro format version into an exported dataset.

How to test

  1. export a dataset as datumaro format
  2. datumaro plugin will choose the proper reader of legacy datumaro format version

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

@bonhunko bonhunko force-pushed the feature/add_version_field_to_datumaro branch 2 times, most recently from a252a3c to fd84f0b Compare March 8, 2023 06:15
@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.05 🎉

Comparison is base (a44fc79) 78.47% compared to head (31e9d8b) 78.53%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #842      +/-   ##
===========================================
+ Coverage    78.47%   78.53%   +0.05%     
===========================================
  Files          196      196              
  Lines        24179    24216      +37     
  Branches      4906     4908       +2     
===========================================
+ Hits         18974    19017      +43     
+ Misses        4114     4110       -4     
+ Partials      1091     1089       -2     
Flag Coverage Δ
macos-11_Python-3.8 77.87% <100.00%> (+0.05%) ⬆️
ubuntu-20.04_Python-3.8 78.52% <100.00%> (+0.05%) ⬆️
windows-2019_Python-3.8 78.46% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
datumaro/plugins/data_formats/datumaro/base.py 95.03% <100.00%> (+1.95%) ⬆️
datumaro/plugins/data_formats/datumaro/exporter.py 95.61% <100.00%> (ø)
datumaro/plugins/data_formats/datumaro/format.py 100.00% <100.00%> (ø)
...umaro/plugins/data_formats/datumaro_binary/base.py 92.95% <100.00%> (+1.02%) ⬆️
...o/plugins/data_formats/datumaro_binary/exporter.py 100.00% <100.00%> (+3.03%) ⬆️

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@bonhunko bonhunko force-pushed the feature/add_version_field_to_datumaro branch 2 times, most recently from 12626c6 to 33912c2 Compare March 8, 2023 06:55
datumaro/components/dataset.py Outdated Show resolved Hide resolved
datumaro/components/importer.py Outdated Show resolved Hide resolved
@vinnamkim vinnamkim added ENHANCE Enhancement of existing features data formats PR is related to dataset formats labels Mar 9, 2023
@vinnamkim vinnamkim added this to the 1.1.0 milestone Mar 9, 2023
@bonhunko bonhunko force-pushed the feature/add_version_field_to_datumaro branch 4 times, most recently from 2ce0e3d to 1e39725 Compare March 10, 2023 01:42
@bonhunko bonhunko requested a review from vinnamkim March 10, 2023 01:43
@bonhunko bonhunko force-pushed the feature/add_version_field_to_datumaro branch 2 times, most recently from cbdeffd to 83e2d36 Compare March 10, 2023 01:47
@bonhunko
Copy link
Contributor Author

I re-organized code as @vinnamkim suggested, please review this PR.

@bonhunko bonhunko force-pushed the feature/add_version_field_to_datumaro branch 6 times, most recently from 7ca5580 to 46787c3 Compare March 10, 2023 03:46
datumaro/plugins/data_formats/datumaro/base.py Outdated Show resolved Hide resolved
Comment on lines 244 to 273
@mark_requirement(Requirements.DATUM_GENERAL_REQ)
def test_export_dm_version(self, test_dir):
dataset = Dataset.from_iterable(
[DatasetItem(id=1, media=Image(data=np.ones((5, 4, 3))))], media_type=Image
)

legacy_dataset_path = "./tests/assets/datumaro_dataset"
for annot_file in glob.glob(
osp.join(
legacy_dataset_path,
DatumaroPath.ANNOTATIONS_DIR,
"**",
"*" + DatumaroPath.ANNOTATION_EXT,
),
recursive=True,
):
dm_base = DatumaroBase(path=annot_file)
assert (
dm_base._get_dm_format_version(dm_base._parsed_anns) == DatumaroBase.LEGACY_VERSION
)

dataset.export(save_dir=test_dir, format="datumaro")
for annot_file in glob.glob(
osp.join(
test_dir, DatumaroPath.ANNOTATIONS_DIR, "**", "*" + DatumaroPath.ANNOTATION_EXT
),
recursive=True,
):
dm_base = DatumaroBase(path=annot_file)
assert dm_base._get_dm_format_version(dm_base._parsed_anns) == DATUMARO_FORMAT_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this TC is already covered by the existing unit tests so that we can remove this. if you want to make a new TC, I recommend testing the broken version compatibility. For example,

@pytest.fixture
def fxt_wrong_version_dir(fxt_test_datumaro_format_dataset, test_dir):
     dest_dir = osp.join(test_dir, "wrong_version")
     fxt_test_datumaro_format_dataset.export(dest_dir, "datumaro")
     < Somehow edit the annotation file to give some weird version to it >
     yield dest_dir

def test_version_compatibilty(self, fxt_wrong_version_dir):
     with pytest.raises(DatasetImportError):
            Dataset.import_from(fxt_wrong_version_dir, "datumaro")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is checking that
exported dm_dataset files have current datumaro format version or not.

As far as I know, existing unit tests doesn't check this.
if I missed some existing tests, please let me know.

BTW, testing the broken version compatibility is a good idea. I'll add the unit test.

Copy link
Contributor

@vinnamkim vinnamkim Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the version compatibility is met, the dataset will be normally parsed without any error, and this is covered by the existing test. I think that code quality is important not only for the source but also for the test code (https://www.springboottutorial.com/code-quality-what-is-code-duplication).

Copy link
Contributor Author

@bonhunko bonhunko Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If somehow datumaro version string is not stored in the exported dataset,

when exporting back, this dataset will be recognized as 'legacy' which is not appreciable but will be parsed well anyway.
Thus I think that we should check whether the exported datumaro has the current datumaro format version well or not.

Copy link
Contributor

@vinnamkim vinnamkim Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we need to satisfy the following two things after this PR.

  1. Succeed to import the dataset without version info (legacy, what has been used)
  2. Succeed to import the dataset with version = 1.0 (to-be)
  3. Fail to import the dataset with version > 1.0

2) will be tested by the existing export and import TCs.

I thought that there has existed a TC for 1) but it doesn't. Sorry for my mistake.

So, it would be better to move tests/assets/datumaro_dataset -> tests/assets/datumaro_dataset/legacy, create a TC to import from tests/assets/datumaro_dataset/legacy and conduct compare_datasets(). For example,

def test_can_import(
self,
fxt_dataset_dir: str,
fxt_expected_dataset: Dataset,
fxt_import_kwargs: Dict[str, Any],
request: pytest.FixtureRequest,
):
return super().test_can_import(
fxt_dataset_dir, fxt_expected_dataset, fxt_import_kwargs, request
)

It is not appealing to call dm_base._get_dm_format_version() because it's a private function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added test_version_compatibilty() and 1) tests as you suggested. Please have a look

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>
@bonhunko bonhunko force-pushed the feature/add_version_field_to_datumaro branch 2 times, most recently from c90adcf to 98ddc50 Compare March 10, 2023 06:53
@bonhunko bonhunko force-pushed the feature/add_version_field_to_datumaro branch from 3d04f45 to 832e7fe Compare March 13, 2023 02:29
@bonhunko
Copy link
Contributor Author

@vinnamkim after rebase to the lastest develop, I added version field to datumaro_binary format.
Please review this change which is not included previously 98ddc50

@bonhunko bonhunko requested a review from vinnamkim March 13, 2023 03:56
Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>
@bonhunko bonhunko force-pushed the feature/add_version_field_to_datumaro branch from 890b98f to 31e9d8b Compare March 13, 2023 04:29
Copy link
Contributor

@vinnamkim vinnamkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@bonhunko bonhunko merged commit 9586356 into openvinotoolkit:develop Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data formats PR is related to dataset formats ENHANCE Enhancement of existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants