Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more dataset format detectors #536

Merged
merged 6 commits into from
Nov 10, 2021
Merged

Add more dataset format detectors #536

merged 6 commits into from
Nov 10, 2021

Conversation

IRDonch
Copy link

@IRDonch IRDonch commented Nov 3, 2021

Summary

Add detectors for a few more dataset formats:

  • ade20k2017
  • ade20k2020
  • cvat
  • datumaro
  • icdar_text_segmentation
  • icdar_word_recognition
  • kitti_raw
  • label_me
  • mot_seq
  • yolo

To support them, add a mechanism for placing requirements on file contents.

How to test

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Comment on lines +93 to +98
Requirement-placing methods that use this to verify their arguments
should raise a FormatRequirementsUnmet rather than a "hard" error like
AssertionError if False is returned. The reason is that the path passed
by the detector might not have been hardcoded, and instead might have
been acquired from another file in the dataset. In that case, an invalid
pattern signifies a problem with the dataset, not with the detector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to raise invalid requirement error in such cases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by that? A new exception type? How would it be handled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising FormatRequirementsUnmet doesn't seem correct when incorrect requirements are specified. But I agree that the whole detection process shouldn't be interrupted in such cases. So, probably, just another error can be introduced. I suppose, we can come up with:
DatasetDetectionError
^- FormatRequirementsUnmet
^- InvalidRequirement

And just catch and return DatasetDetectionErrors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, probably, just another error can be introduced.

That will cause a problem later with the addition of alternatives. If one alternative raises a FormatRequirementsUnmet, and another raises an InvalidRequirement, then what exception should the detector as a whole raise?

Furthermore, I predict that "incorrect requirements" will, in practice, be caused by the dataset not meeting other format requirements (like in the scenario explained in the comment: when a value is read from a file that's supposed to be a path, but isn't), and therefore it's actually reasonable to report them as unmet requirements.

The only other possible cause that I can see would be an invalid requirement that is hardcoded into the detector, but such a detector will always fail, so an error like that would be caught in testing and corrected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will cause a problem later with the addition of alternatives. If one alternative raises a FormatRequirementsUnmet, and another raises an InvalidRequirement, then what exception should the detector as a whole raise?

I don't think I understand the way you want to implement alternatives, but unless they are implemented, the question makes no sense. From my perspective, we will iterate over alternatives and then return a list of errors. If there is a matching one, the confidence is returned.

Maybe, if there are incorrect requirements, we need to stop checking the format at all. Print a debug log message / a warning, return the NONE confidence. I expect such situations to happen only in custom plugins, or during development. So maybe, we can just fail, actually.

Roman Donchenko added 5 commits November 10, 2021 15:46
Currently, such requirements are described by a single requirement string
(the one specified in the call to `probe_text_file`). I considered making it
so that you could specify a separate requirement string for each test done
in the prober context (e.g. "must be an XML file"; "must have `annotations`
as the root element"), but that seems cumbersome to use and not terribly
important. If needed, this functionality could be added later (for example,
we could add a method on the context that will tell it to use a more specific
message for the next exception thrown).
Unfortunately, JSON files can't really be iteratively parsed (because
object keys can be stored in any order), so when we need to probe the
contents of such files, we have to parse the entire file.
Previously the subtests in `test_can_parse` checked what happens if a
revpath with no format specified was ambiguous due to multiple detected
formats. However, since the addition of a precise detector for the Datumaro
format, that revpath is not ambiguous anymore. Add a separate test that uses
a dataset that deliberately mixes annotations from different formats.
Copy link
Contributor

@zhiltsov-max zhiltsov-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets' merge after the mapillary problem is resolved.

@IRDonch IRDonch merged commit a027132 into openvinotoolkit:develop Nov 10, 2021
@IRDonch IRDonch deleted the more-detectors branch November 10, 2021 17:48
zhiltsov-max pushed a commit that referenced this pull request Nov 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants