mark datumaro library version when exporting as datumaro #842

bonhunko · 2023-03-08T06:05:57Z

Summary

Resolves https://jira.devtools.intel.com/browse/CVS-105523

This PR adds the datumaro format version into an exported dataset.

How to test

export a dataset as datumaro format
datumaro plugin will choose the proper reader of legacy datumaro format version

Checklist

I submit my changes into the develop branch
I have added description of my changes into CHANGELOG
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below)

# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

codecov-commenter · 2023-03-08T06:16:12Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.05 🎉

Comparison is base (a44fc79) 78.47% compared to head (31e9d8b) 78.53%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #842      +/-   ##
===========================================
+ Coverage    78.47%   78.53%   +0.05%     
===========================================
  Files          196      196              
  Lines        24179    24216      +37     
  Branches      4906     4908       +2     
===========================================
+ Hits         18974    19017      +43     
+ Misses        4114     4110       -4     
+ Partials      1091     1089       -2

Flag	Coverage Δ
macos-11_Python-3.8	`77.87% <100.00%> (+0.05%)`	⬆️
ubuntu-20.04_Python-3.8	`78.52% <100.00%> (+0.05%)`	⬆️
windows-2019_Python-3.8	`78.46% <100.00%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
datumaro/plugins/data_formats/datumaro/base.py	`95.03% <100.00%> (+1.95%)`	⬆️
datumaro/plugins/data_formats/datumaro/exporter.py	`95.61% <100.00%> (ø)`
datumaro/plugins/data_formats/datumaro/format.py	`100.00% <100.00%> (ø)`
...umaro/plugins/data_formats/datumaro_binary/base.py	`92.95% <100.00%> (+1.02%)`	⬆️
...o/plugins/data_formats/datumaro_binary/exporter.py	`100.00% <100.00%> (+3.03%)`	⬆️

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

datumaro/components/dataset.py

datumaro/components/importer.py

bonhunko · 2023-03-10T01:50:50Z

I re-organized code as @vinnamkim suggested, please review this PR.

datumaro/plugins/data_formats/datumaro/base.py

datumaro/plugins/data_formats/datumaro/exporter.py

datumaro/plugins/data_formats/datumaro/base.py

vinnamkim · 2023-03-10T04:05:50Z

tests/unit/data_formats/datumaro/test_datumaro_format.py

+    @mark_requirement(Requirements.DATUM_GENERAL_REQ)
+    def test_export_dm_version(self, test_dir):
+        dataset = Dataset.from_iterable(
+            [DatasetItem(id=1, media=Image(data=np.ones((5, 4, 3))))], media_type=Image
+        )
+
+        legacy_dataset_path = "./tests/assets/datumaro_dataset"
+        for annot_file in glob.glob(
+            osp.join(
+                legacy_dataset_path,
+                DatumaroPath.ANNOTATIONS_DIR,
+                "**",
+                "*" + DatumaroPath.ANNOTATION_EXT,
+            ),
+            recursive=True,
+        ):
+            dm_base = DatumaroBase(path=annot_file)
+            assert (
+                dm_base._get_dm_format_version(dm_base._parsed_anns) == DatumaroBase.LEGACY_VERSION
+            )
+
+        dataset.export(save_dir=test_dir, format="datumaro")
+        for annot_file in glob.glob(
+            osp.join(
+                test_dir, DatumaroPath.ANNOTATIONS_DIR, "**", "*" + DatumaroPath.ANNOTATION_EXT
+            ),
+            recursive=True,
+        ):
+            dm_base = DatumaroBase(path=annot_file)
+            assert dm_base._get_dm_format_version(dm_base._parsed_anns) == DATUMARO_FORMAT_VERSION


I think this TC is already covered by the existing unit tests so that we can remove this. if you want to make a new TC, I recommend testing the broken version compatibility. For example,

@pytest.fixture def fxt_wrong_version_dir(fxt_test_datumaro_format_dataset, test_dir): dest_dir = osp.join(test_dir, "wrong_version") fxt_test_datumaro_format_dataset.export(dest_dir, "datumaro") < Somehow edit the annotation file to give some weird version to it > yield dest_dir def test_version_compatibilty(self, fxt_wrong_version_dir): with pytest.raises(DatasetImportError): Dataset.import_from(fxt_wrong_version_dir, "datumaro")

This test is checking that
exported dm_dataset files have current datumaro format version or not.

As far as I know, existing unit tests doesn't check this.
if I missed some existing tests, please let me know.

BTW, testing the broken version compatibility is a good idea. I'll add the unit test.

If the version compatibility is met, the dataset will be normally parsed without any error, and this is covered by the existing test. I think that code quality is important not only for the source but also for the test code (https://www.springboottutorial.com/code-quality-what-is-code-duplication).

If somehow datumaro version string is not stored in the exported dataset,

when exporting back, this dataset will be recognized as 'legacy' which is not appreciable but will be parsed well anyway.
Thus I think that we should check whether the exported datumaro has the current datumaro format version well or not.

Basically, we need to satisfy the following two things after this PR.

Succeed to import the dataset without version info (legacy, what has been used)

Succeed to import the dataset with version = 1.0 (to-be)

Fail to import the dataset with version > 1.0

2) will be tested by the existing export and import TCs.

I thought that there has existed a TC for 1) but it doesn't. Sorry for my mistake.

So, it would be better to move tests/assets/datumaro_dataset -> tests/assets/datumaro_dataset/legacy, create a TC to import from tests/assets/datumaro_dataset/legacy and conduct compare_datasets(). For example,

datumaro/tests/unit/data_formats/test_common_semantic_segmentation_format.py

Lines 135 to 144 in a44fc79

def test_can_import(

self,

fxt_dataset_dir: str,

fxt_expected_dataset: Dataset,

fxt_import_kwargs: Dict[str, Any],

request: pytest.FixtureRequest,

):

return super().test_can_import(

fxt_dataset_dir, fxt_expected_dataset, fxt_import_kwargs, request

)

It is not appealing to call dm_base._get_dm_format_version() because it's a private function.

I added test_version_compatibilty() and 1) tests as you suggested. Please have a look

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

bonhunko · 2023-03-13T03:55:53Z

@vinnamkim after rebase to the lastest develop, I added version field to datumaro_binary format.
Please review this change which is not included previously 98ddc50

datumaro/plugins/data_formats/datumaro_binary/base.py

datumaro/plugins/data_formats/datumaro_binary/exporter.py

datumaro/plugins/data_formats/datumaro_binary/base.py

datumaro/plugins/data_formats/datumaro_binary/exporter.py

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

vinnamkim

LGTM.

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

bonhunko force-pushed the feature/add_version_field_to_datumaro branch 2 times, most recently from a252a3c to fd84f0b Compare March 8, 2023 06:15

bonhunko force-pushed the feature/add_version_field_to_datumaro branch 2 times, most recently from 12626c6 to 33912c2 Compare March 8, 2023 06:55

bonhunko requested review from vinnamkim, wonjuleee, sooahleex and chuneuny-emily March 8, 2023 07:22

vinnamkim reviewed Mar 8, 2023

View reviewed changes

datumaro/components/dataset.py Outdated Show resolved Hide resolved

datumaro/components/importer.py Outdated Show resolved Hide resolved

vinnamkim added ENHANCE Enhancement of existing features data formats PR is related to dataset formats labels Mar 9, 2023

vinnamkim added this to the 1.1.0 milestone Mar 9, 2023

bonhunko force-pushed the feature/add_version_field_to_datumaro branch 4 times, most recently from 2ce0e3d to 1e39725 Compare March 10, 2023 01:42

bonhunko requested a review from vinnamkim March 10, 2023 01:43

bonhunko force-pushed the feature/add_version_field_to_datumaro branch 2 times, most recently from cbdeffd to 83e2d36 Compare March 10, 2023 01:47

vinnamkim reviewed Mar 10, 2023

View reviewed changes

datumaro/plugins/data_formats/datumaro/base.py Outdated Show resolved Hide resolved

datumaro/plugins/data_formats/datumaro/exporter.py Outdated Show resolved Hide resolved

datumaro/plugins/data_formats/datumaro/base.py Outdated Show resolved Hide resolved

bonhunko force-pushed the feature/add_version_field_to_datumaro branch 6 times, most recently from 7ca5580 to 46787c3 Compare March 10, 2023 03:46

vinnamkim reviewed Mar 10, 2023

View reviewed changes

Update datumaro/plugins/data_formats/datumaro/base.py

c827b3a

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

bonhunko force-pushed the feature/add_version_field_to_datumaro branch 2 times, most recently from c90adcf to 98ddc50 Compare March 10, 2023 06:53

bonhunko and others added 6 commits March 10, 2023 20:15

mark datumaro library version when exporting as datumaro

ea3d884

Update datumaro/plugins/data_formats/datumaro/base.py

afcbd6f

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

use cache to remove duplicated json parsing

46787c3

Update datumaro/plugins/data_formats/datumaro/base.py

703dbdc

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

reform backward compatibility structure

47498d5

add version field to datumaro_binary format

98ddc50

bonhunko force-pushed the feature/add_version_field_to_datumaro branch from 3d04f45 to 832e7fe Compare March 13, 2023 02:29

bonhunko requested a review from vinnamkim March 13, 2023 03:56

vinnamkim reviewed Mar 13, 2023

View reviewed changes

Update datumaro/plugins/data_formats/datumaro_binary/base.py

b3bea0d

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

bonhunko force-pushed the feature/add_version_field_to_datumaro branch from 890b98f to 31e9d8b Compare March 13, 2023 04:29

vinnamkim approved these changes Mar 13, 2023

View reviewed changes

bonhunko merged commit 9586356 into openvinotoolkit:develop Mar 13, 2023

bonhunko and others added 4 commits March 13, 2023 19:39

add datumaro legacy dataset unit test

832e7fe

add unit_test for wrong version string

8a3eccc

Update datumaro/plugins/data_formats/datumaro_binary/base.py

5f8bd16

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

unify binary hashing element type to dictionary

31e9d8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mark datumaro library version when exporting as datumaro #842

mark datumaro library version when exporting as datumaro #842

bonhunko commented Mar 8, 2023 •

edited

Loading

codecov-commenter commented Mar 8, 2023 •

edited

Loading

bonhunko commented Mar 10, 2023

vinnamkim Mar 10, 2023

bonhunko Mar 10, 2023

vinnamkim Mar 10, 2023 •

edited

Loading

bonhunko Mar 10, 2023 •

edited

Loading

vinnamkim Mar 10, 2023 •

edited

Loading

bonhunko Mar 13, 2023

bonhunko commented Mar 13, 2023

vinnamkim left a comment

	def test_can_import(
	self,
	fxt_dataset_dir: str,
	fxt_expected_dataset: Dataset,
	fxt_import_kwargs: Dict[str, Any],
	request: pytest.FixtureRequest,
	):
	return super().test_can_import(
	fxt_dataset_dir, fxt_expected_dataset, fxt_import_kwargs, request
	)

mark datumaro library version when exporting as datumaro #842

mark datumaro library version when exporting as datumaro #842

Conversation

bonhunko commented Mar 8, 2023 • edited Loading

Summary

How to test

Checklist

License

codecov-commenter commented Mar 8, 2023 • edited Loading

Codecov Report

bonhunko commented Mar 10, 2023

vinnamkim Mar 10, 2023

Choose a reason for hiding this comment

bonhunko Mar 10, 2023

Choose a reason for hiding this comment

vinnamkim Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

bonhunko Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

vinnamkim Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

bonhunko Mar 13, 2023

Choose a reason for hiding this comment

bonhunko commented Mar 13, 2023

vinnamkim left a comment

Choose a reason for hiding this comment

bonhunko commented Mar 8, 2023 •

edited

Loading

codecov-commenter commented Mar 8, 2023 •

edited

Loading

vinnamkim Mar 10, 2023 •

edited

Loading

bonhunko Mar 10, 2023 •

edited

Loading

vinnamkim Mar 10, 2023 •

edited

Loading