Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add with_subset_dirs decorator #816

Merged

Conversation

vinnamkim
Copy link
Contributor

@vinnamkim vinnamkim commented Feb 20, 2023

Summary

  • Ticket no. 104802
  • Please see (TEMP) Datumaro Dataset CLI Guide by @sungmanc. It describes how to input the data path when using OTX CLI. However, the current state is that there are different rules according to task types, which can cause confusion to users. For example, Multi-class Classification task requires otx train [${template}] --train-data-roots ${root}/train --val-data-roots ${root}/val which needs the directory path at the subset level but Multi-label Classification task requires only the root level directory path such as otx train [${template}] --train-data-roots ${root} --val-data-roots ${root}. This should be resolved from Datumaro side.
  • In this PR, we introduce @with_subset_dirs decorator to make the existing importer to handle the following directory structure including subsets:
root
  - train
     - {dataset}
  - val
     - {dataset}
  - test
     - {dataset}
  • Using @with_subset_dirs, ImagenetWithSubsetDirsImporter will cover the Multi-class Classification task case.

How to test

I added unit tests to cover this feature.

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
@vinnamkim vinnamkim marked this pull request as ready for review February 20, 2023 02:55
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2023

Codecov Report

Base: 78.26% // Head: 78.26% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (ba8cb54) compared to base (fd4d695).
Patch coverage: 84.74% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #816   +/-   ##
========================================
  Coverage    78.26%   78.26%           
========================================
  Files          184      184           
  Lines        23563    23614   +51     
  Branches      4880     4892   +12     
========================================
+ Hits         18442    18482   +40     
- Misses        4028     4033    +5     
- Partials      1093     1099    +6     
Flag Coverage Δ
macos-11_Python-3.8 ?
ubuntu-20.04_Python-3.8 78.25% <84.74%> (+<0.01%) ⬆️
windows-2019_Python-3.8 78.19% <84.74%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
datumaro/components/importer.py 89.43% <80.43%> (-5.51%) ⬇️
datumaro/components/dataset.py 82.42% <100.00%> (ø)
datumaro/components/environment.py 90.90% <100.00%> (+0.14%) ⬆️
datumaro/plugins/data_formats/imagenet.py 90.14% <100.00%> (+0.58%) ⬆️
datumaro/plugins/synthetic_data/utils.py 83.33% <0.00%> (-1.67%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@vinnamkim vinnamkim added ENHANCE Enhancement of existing features data formats PR is related to dataset formats labels Feb 20, 2023
@vinnamkim vinnamkim added this to the 1.1.0 milestone Feb 20, 2023
@vinnamkim
Copy link
Contributor Author

Please see #817 (comment), this PR should be blocked until the backport of v1.0.0.

@sungmanc
Copy link

What a nice feature, it must improve the user experience. Thanks

 - Move from v1.0.0 to v1.1.0

Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
@vinnamkim
Copy link
Contributor Author

Please see #817 (comment), this PR should be blocked until the backport of v1.0.0.

Done. It's ready for review now.

Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
datumaro/components/dataset.py Show resolved Hide resolved
datumaro/components/format_detection.py Outdated Show resolved Hide resolved
datumaro/plugins/data_formats/imagenet.py Show resolved Hide resolved
datumaro/components/importer.py Show resolved Hide resolved
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Copy link
Contributor

@wonjuleee wonjuleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but could you provide some examples for using with_subset_dirs more than ImageNet cases?

@vinnamkim
Copy link
Contributor Author

Looks good, but could you provide some examples for using with_subset_dirs more than ImageNet cases?

After this PR. I'll do a next work to plugin with_subset_dirs to Common Semantic Segmentation format.

image

More works need to investigate another format which we can plugin with_subset_dirs.

@wonjuleee
Copy link
Contributor

wonjuleee commented Feb 23, 2023

Looks good, but could you provide some examples for using with_subset_dirs more than ImageNet cases?

After this PR. I'll do a next work to plugin with_subset_dirs to Common Semantic Segmentation format.

image

More works need to investigate another format which we can plugin with_subset_dirs.

It must be good!

@vinnamkim vinnamkim merged commit 4624d59 into openvinotoolkit:develop Feb 23, 2023
@vinnamkim vinnamkim deleted the add-with-subset-dirs-decorator branch February 23, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data formats PR is related to dataset formats ENHANCE Enhancement of existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants