Improve COCO parsing error messages #684

zhiltsov-max · 2022-03-18T15:33:56Z

Summary

Extended dataset parsing error classification
Added default string conversion for parsing errors
Added COCO json parsing errors
Added tests

How to test

Checklist

I submit my changes into the develop branch
I have added description of my changes into CHANGELOG
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below)

# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

datumaro/components/errors.py

datumaro/plugins/coco_format/extractor.py

datumaro/components/errors.py

IRDonch · 2022-03-21T13:32:53Z

tests/test_coco_format.py

+                    anns = {
+                        "images": [
+                            {
+                                "id": 5,
+                                "width": 10,
+                                "height": 5,
+                                "file_name": "a.jpg",
+                            }
+                        ],
+                        "annotations": [],
+                        "categories": [],
+                    }


Most of these tests are very similar; I think it would be more readable to factor out the common parts into a single function that takes one parameter (a function that mangles the dataset) and returns the resulting exception. Then the individual tests could just look like this:

def mangle(anns): del anns["images"][0][field] exc = self._test_mangled_annotations(mangle) self.assertIsInstance(capture.exception.__cause__, MissingFieldError) self.assertEqual(capture.exception.__cause__.name, field)

Agree about the common part extraction, I'll check if it's possible.

You extracted a shared annotation template (which is certainly a big improvement), but there's still a lot of duplication in the structure of the tests. How about factoring that out too?

How do you see that?

Like in my code snippet above. In full it would look something like this:

def _load_mangled_annotations(self, mangle): with TestDir() as test_dir: ann_path = osp.join(test_dir, "ann.json") anns = mangle(deepcopy(self.ANNOTATION_JSON_TEMPLATE)) dump_json_file(ann_path, anns) with self.assertRaises(ItemImportError) as capture: Dataset.import_from(ann_path, "coco_instances") return capture def test_can_report_missing_item_field(self): for field in ["id", "file_name"]: with self.subTest(field=field): def mangle(anns): anns["images"][0].pop(field) exc = self._load_mangled_annotations(mangle) self.assertIsInstance(capture.exception.__cause__, MissingFieldError) self.assertEqual(capture.exception.__cause__.name, field)

Maybe, but it seem to save 1 line of code per test. It also requires to catch DatasetImportError and check the type and message in the test.

tests/test_coco_format.py

datumaro/plugins/coco_format/extractor.py

IRDonch · 2022-03-23T13:11:43Z

tests/test_coco_format.py

@@ -836,6 +848,205 @@ def test_can_pickle(self):
                compare_datasets_strict(self, source, parsed)


+class CocoExtractorTests(TestCase):
+    ANNOTATION_JSON_TEMPLATE = {


IMO, it would be useful to have a test that loads the unmodified template, to ensure that any errors in the other tests are due to the modifications made in those tests and not due to the template itself.

The fact that the common piece was possible to extract is occasional, it is not something designed or expected. There are lots of tests that cover successful loading.

The fact that the common piece was possible to extract is occasional, it is not something designed or expected.

How is that not expected? The point of the new tests is to make sure that specific errors in the dataset cause the extraction to fail. To be certain that it's those errors that cause the failure, we need to ensure that the original template contains no errors.

It is checked by the error details checks. However, I can agree, that such a test could be useful on its own, because there is no such test for just a json parsing.

zhiltsov-max requested a review from IRDonch March 18, 2022 15:36

zhiltsov-max force-pushed the zm/coco-parsing-errors branch 2 times, most recently from 21b7006 to 20703c5 Compare March 18, 2022 17:33

Improve error messages in COCO format

9238ca2

zhiltsov-max force-pushed the zm/coco-parsing-errors branch from 20703c5 to 9238ca2 Compare March 18, 2022 19:25

IRDonch suggested changes Mar 21, 2022

View reviewed changes

IRDonch reviewed Mar 21, 2022

View reviewed changes

Maxim Zhiltsov added 4 commits March 21, 2022 17:37

Fix typo

29835c4

Add quotes to error messages

f2aa35a

Fix message

0c3f53e

Remove extra type from annotations

4cb1488

zhiltsov-max mentioned this pull request Mar 21, 2022

Improve VOC parsing errors #686

Merged

7 tasks

Maxim Zhiltsov added 7 commits March 22, 2022 12:18

Split polygon points check

c91f928

Use f style for most messages

f27e699

Add mask size value count check

c523812

Prohibit float values in size

3cb8ee8

Extract commond annotation template in error tests

eeba792

Rename invalid label to undeclared label

7c36135

Add validators and docs, update error hierarchy

175bae2

IRDonch reviewed Mar 22, 2022

View reviewed changes

datumaro/plugins/coco_format/extractor.py Outdated Show resolved Hide resolved

Maxim Zhiltsov added 2 commits March 23, 2022 10:53

Add overloads to parse_field

ff4cd82

Add score field type test

49dd4f2

IRDonch reviewed Mar 23, 2022

View reviewed changes

IRDonch approved these changes Mar 23, 2022

View reviewed changes

IRDonch merged commit 543ab1a into develop Mar 23, 2022

IRDonch deleted the zm/coco-parsing-errors branch March 23, 2022 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve COCO parsing error messages #684

Improve COCO parsing error messages #684

zhiltsov-max commented Mar 18, 2022 •

edited

Loading

IRDonch Mar 21, 2022

zhiltsov-max Mar 22, 2022

zhiltsov-max Mar 23, 2022

IRDonch Mar 23, 2022

zhiltsov-max Mar 23, 2022

IRDonch Mar 23, 2022

zhiltsov-max Mar 23, 2022 •

edited

Loading

IRDonch Mar 23, 2022

zhiltsov-max Mar 23, 2022

IRDonch Mar 23, 2022

zhiltsov-max Mar 23, 2022 •

edited

Loading

Improve COCO parsing error messages #684

Improve COCO parsing error messages #684

Conversation

zhiltsov-max commented Mar 18, 2022 • edited Loading

Summary

How to test

Checklist

License

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max Mar 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max Mar 23, 2022 • edited Loading

Choose a reason for hiding this comment

zhiltsov-max commented Mar 18, 2022 •

edited

Loading

zhiltsov-max Mar 23, 2022 •

edited

Loading

zhiltsov-max Mar 23, 2022 •

edited

Loading