Kate/splitter cli #81

jihyeonyi · 2021-01-12T04:13:30Z

Summary

This PR includes

supporting CLI for task-specific split
Revise re-identification split
Update documentation regarding the task-specific split

How to test

Unittest

$ python -m unittest -v tests/test_splitter.py

Testing classification split with imagenet dataset.

Notes: Imagenet doesn't support subsets but, checking subsets at the project level is enough here.

$ pip install .
$ datum project create -o imagenet
$ datum source add path <path-to-source> -f imagenet -p imagenet/
$ datum project transform -t classification_split -p imagenet/ -- --subset train:.5 --subset val:.2 --subset test:.3
$ datum project info -p imagenet-classification_split

Testing detection split with voc dataset

$ pip install .
$ datum project import -i <path-to-voc> -f voc
$ cd voc/
$ datum project transform -t detection_split -- --subset train:.5 --subset val:.2 --subset test:.3
$ datum project info -p voc-detection_split

Testing re-identification split with imagenet dataset.

Notes: Datumaro doesn't support re-id dataset now, so the classification dataset is used instead.

$ pip install .
$ datum project create -o imagenet
$ datum source add path <path-to-imagenet> -f imagenet -p imagenet/
$ datum project transform -t reidentification_split -p imagenet/ -- --subset train:.5 --subset val:.2 --subset test:.3 --query .5
$ datum project info -p imagenet-reidentification_split

Checklist

I submit my changes into the develop branch
I have added description of my changes into CHANGELOG
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below)

# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT

datumaro/plugins/splitter.py

README.md

tests/test_splitter.py

zhiltsov-max

Please check the updated class descriptions for correctness.

Future updates could include:

ignoring attributes in classification split (for captions, descriptions and other technical attributes)
splitting using an attribute as label in classification split
using polygons and masks in detection split

README.md

jihyeonyi · 2021-01-14T10:24:23Z

datumaro/plugins/splitter.py

+    Produces a split with a specified ratio of images, avoiding having same
+    labels in different subsets.|n


Here, we avoid having the same person id or object id. It could be label or attribute if attr_for_id is specified.

One more thing is, actually train and val set share person id or object id. (Most person re-identification data doesn't have val set though). But they do not share IDs with test set.
I'm not sure how accurate the explanation should be.
If you feel the current explanation is sufficient, please leave it as it is.

jihyeonyi · 2021-01-14T10:49:02Z

Please check the updated class descriptions for correctness.

Future updates could include:

ignoring attributes in classification split (for captions, descriptions and other technical attributes)

splitting using an attribute as label in classification split

using polygons and masks in detection split

Thank you for revising the descriptions.
And for future updates,

Would you like to remove the attribute-based splitting or just make it optional?
I think the latter is better.
When you say 'splitting using an attribute as label', do you mean splitting using only attributes, regardless of labels?
Does the detection task have polygons or masks? I thought it is for the segmentation task. Maybe I'm wrong.
For your information, I'll add a splitter for the segmentation task. So why don't you add polygons or masks later?

zhiltsov-max · 2021-01-14T14:19:39Z

Would you like to remove the attribute-based splitting or just make it optional?

Optional, enabled by default.

When you say 'splitting using an attribute as label', do you mean splitting using only attributes, regardless of labels?

I mean using a single attribute, like in re-id. Maybe, using some subset of them / ignoring some attributes.

Does the detection task have polygons or masks?

In Mask R-CNN they are intermixed with segmentation task. I, personally, consider these types of annotations more or less interchangeable, because all these types can be used for training a segmentation and a detection algorithm.

…SpecificSplit), 3. revise test code

jihyeonyi force-pushed the kate/splitter-cli branch from 84b16c6 to ff4cd80 Compare January 12, 2021 04:36

zhiltsov-max previously approved these changes Jan 13, 2021

View reviewed changes

datumaro/plugins/splitter.py Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

tests/test_splitter.py Outdated Show resolved Hide resolved

jihyeonyi dismissed zhiltsov-max’s stale review via dc85888 January 14, 2021 03:15

jihyeonyi force-pushed the kate/splitter-cli branch from ff4cd80 to dc85888 Compare January 14, 2021 03:15

Update changelog

bec7467

zhiltsov-max approved these changes Jan 14, 2021

View reviewed changes

jihyeonyi commented Jan 14, 2021

View reviewed changes

README.md Show resolved Hide resolved

jihyeonyi commented Jan 14, 2021

View reviewed changes

zhiltsov-max merged commit 1ee908f into develop Jan 14, 2021

zhiltsov-max deleted the kate/splitter-cli branch February 16, 2021 10:55

jihyeonyi and others added 6 commits March 13, 2021 15:34

add cli support for classification/detection splitter

e9c896e

revisit re-id splitter and implement cli for re-id

011b852

update documentation for task-specific split

875e385

add changelog and revert toc part of user_manual and README

d3231a4

1. add more description regarding split, 2. move to base class (_Task…

dc85888

…SpecificSplit), 3. revise test code

Update docs

55e0718

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kate/splitter cli #81

Kate/splitter cli #81

jihyeonyi commented Jan 12, 2021 •

edited

Loading

zhiltsov-max left a comment

jihyeonyi Jan 14, 2021

jihyeonyi Jan 14, 2021 •

edited

Loading

jihyeonyi commented Jan 14, 2021

zhiltsov-max commented Jan 14, 2021

		Produces a split with a specified ratio of images, avoiding having same
		labels in different subsets.\|n

Kate/splitter cli #81

Kate/splitter cli #81

Conversation

jihyeonyi commented Jan 12, 2021 • edited Loading

Summary

How to test

Checklist

License

zhiltsov-max left a comment

Choose a reason for hiding this comment

jihyeonyi Jan 14, 2021

Choose a reason for hiding this comment

jihyeonyi Jan 14, 2021 • edited Loading

Choose a reason for hiding this comment

jihyeonyi commented Jan 14, 2021

zhiltsov-max commented Jan 14, 2021

jihyeonyi commented Jan 12, 2021 •

edited

Loading

jihyeonyi Jan 14, 2021 •

edited

Loading