Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import for ADE20K dataset #400

Merged
merged 55 commits into from
Aug 20, 2021
Merged

Import for ADE20K dataset #400

merged 55 commits into from
Aug 20, 2021

Conversation

sizov-kirill
Copy link

@sizov-kirill sizov-kirill commented Aug 5, 2021

Summary

Resolved #399.

How to test

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

@sizov-kirill sizov-kirill changed the title Import for ADE20K dataset [WIP] Import for ADE20K dataset Aug 5, 2021
@sizov-kirill sizov-kirill changed the title [WIP] Import for ADE20K dataset Import for ADE20K dataset Aug 6, 2021
Copy link
Contributor

@zhiltsov-max zhiltsov-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check if the imported dataset can be converted into some other format? Pascal VOC or other.


if Ade20k2020Path.MASK_IMAGE_PATTERN.search(item_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, only the basename should be checked. Also, consider using re.fullmatch to match the whole name.

Copy link
Author

@sizov-kirill sizov-kirill Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will change the item_id to the basename, but I don't think that we should use re.,fullmatch here, because we don't know full name for instance mask (that should be skipped). We only recognize it after parsing the annotation file.

@sizov-kirill
Copy link
Author

Could you check if the imported dataset can be converted into some other format? Pascal VOC or other.

I checked converting with VOC, COCO and KITTI formats. Conversion to COCO failed, but I think it's no related with this PR, the problem occurs here:

masks = (m.image for m in masks)
if mask is not None:
masks += chain(masks, [mask])

masks has type generator, but then we use += between generator and itertools.chain Python doesn't support such concatenation. Or I'm wrong?

MASK_PATTERN = re.compile(r'''\w+_seg\.\w+
| \w+_parts_\d+\.\w+
| instance_\w+\.\w+
''', re.VERBOSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't \w+ too restrictive? It won't accept digits and special characters like - and _.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll add - and spaces, regarding digits and underscores In the docs says that \w accepts them.

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Aug 20, 2021

masks has type generator, but then we use += between generator and itertools.chain Python doesn't support such concatenation. Or I'm wrong?

Yes, looks like a bug in COCO. Can you fix it?

@sizov-kirill
Copy link
Author

masks has type generator, but then we use += between generator and itertools.chain Python doesn't support such concatenation. Or I'm wrong?

Yes, looks like a bug in COCO. Can you fix it?

Yes, I'll fix it in new PR.

Comment on lines 23 to 25
MASK_PATTERN = re.compile(r'''[\w|\s|-]+_seg\.\w+
| [\w|\s|-]+_parts_\d+\.\w+
''', re.VERBOSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MASK_PATTERN = re.compile(r'''[\w|\s|-]+_seg\.\w+
| [\w|\s|-]+_parts_\d+\.\w+
''', re.VERBOSE)
MASK_PATTERN = re.compile(r'''
.+_seg
| .+_parts_\d+
''', re.VERBOSE)

Comment on lines 24 to 26
MASK_PATTERN = re.compile(r'''[\w|\s|-]+_seg\.\w+
| [\w|\s|-]+_parts_\d+\.\w+
| instance_[\w|\s|-]+\.\w+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MASK_PATTERN = re.compile(r'''[\w|\s|-]+_seg\.\w+
| [\w|\s|-]+_parts_\d+\.\w+
| instance_[\w|\s|-]+\.\w+
MASK_PATTERN = re.compile(r'''.+_seg
| .+_parts_\d+
| instance_.+

for image_path in sorted(images):
item_id = osp.splitext(osp.relpath(image_path, path))[0]

if Ade20k2020Path.MASK_PATTERN.fullmatch(osp.basename(image_path)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if Ade20k2020Path.MASK_PATTERN.fullmatch(osp.basename(image_path)):
if Ade20k2020Path.MASK_PATTERN.fullmatch(osp.basename(image_id)):

for image_path in sorted(images):
item_id = osp.splitext(osp.relpath(image_path, path))[0]

if Ade20k2017Path.MASK_PATTERN.fullmatch(osp.basename(image_path)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if Ade20k2017Path.MASK_PATTERN.fullmatch(osp.basename(image_path)):
if Ade20k2017Path.MASK_PATTERN.fullmatch(osp.basename(image_id)):

@zhiltsov-max zhiltsov-max merged commit f1df870 into develop Aug 20, 2021
@zhiltsov-max zhiltsov-max deleted the sk/ade20k branch August 20, 2021 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement import for ADE20K dataset
2 participants