User documentation for Pascal VOC format #228

sizov-kirill · 2021-04-28T15:01:45Z

Summary

Added user documentation for Pascal VOC format #224

How to test

Checklist

I submit my changes into the develop branch
I have added description of my changes into CHANGELOG
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below)

# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT

docs/pascal_voc_user_manual.md

zhiltsov-max · 2021-04-29T09:54:28Z

docs/pascal_voc_user_manual.md

+datum export -f voc_layout -- --label-map voc
+```
+
+## Datumaro functionality


Try to present this part as a tutorial about solving some practical problems. Try to show and explain why we might need to do this. It will much more interesting and useful than just a list of commands.

zhiltsov-max · 2021-04-29T09:59:04Z

docs/pascal_voc_user_manual.md

+Unmatched items in the second dataset: {('2011_002719', 'trainval'), ... }
+
+# extract dataset with only car and bus class items from train subset
+datum filter --mode items+annotations \


Every transform command outputs result into a new project directory. An output directory can be specified with -o.

zhiltsov-max

PR is good, but it needs to be more practical. There is no need to show everything Datumaro can, but there is need to show how Datumaro can be used with PASCAL VOC to solve relevant tasks.

zhiltsov-max · 2021-05-04T12:47:35Z

docs/pascal_voc_user_manual.md

+The Pascal VOC dataset is available for free download
+[here](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit)
+
+There are two ways to create datumaro project and add Pascal VOC dataset to it


Suggested change

There are two ways to create datumaro project and add Pascal VOC dataset to it

There are two ways to create datumaro project and add Pascal VOC dataset to it:

zhiltsov-max · 2021-05-04T12:48:39Z

docs/pascal_voc_user_manual.md

+The ImageSets directory should contain at least one of the directories:
+Main, Layout, Action, Segmentation. These directories contain `.txt` files
+with a list of images in a subset, the subset name is the same as the `.txt` file name .


Suggested change

The ImageSets directory should contain at least one of the directories:

Main, Layout, Action, Segmentation. These directories contain `.txt` files

with a list of images in a subset, the subset name is the same as the `.txt` file name .

The `ImageSets` directory should contain at least one of the directories:

`Main`, `Layout`, `Action`, `Segmentation`. These directories contain `.txt` files

with a list of images in a subset, the subset name is the same as the `.txt` file name .

zhiltsov-max · 2021-05-04T12:49:49Z

docs/pascal_voc_user_manual.md

+Main, Layout, Action, Segmentation. These directories contain `.txt` files
+with a list of images in a subset, the subset name is the same as the `.txt` file name .
+
+Also it is possible to add Pascal VOC dataset and specify task for it, for example:


Suggested change

Also it is possible to add Pascal VOC dataset and specify task for it, for example:

Is is also possible to import specific tasks of PASCAL VOC dataset instead of the whole dataset, for example:

zhiltsov-max · 2021-05-04T12:52:25Z

docs/pascal_voc_user_manual.md

+In addition to `voc_detection`, Datumaro supports
+`voc_action` (for action classification task),
+`voc_classification`,
+`voc_segmentation`,
+`voc_layout` (for person layout task).


Suggested change

In addition to `voc_detection`, Datumaro supports

`voc_action` (for action classification task),

`voc_classification`,

`voc_segmentation`,

`voc_layout` (for person layout task).

Datumaro supports the following PASCAL VOC tasks:

- Image classification (`voc_classification`)

- Object detection (`voc_detection`)

- Action classification (`voc_action`)

- Class and instance segmentation (`voc_segmentation`)

- Person layout detection (`voc_layout`)

zhiltsov-max · 2021-05-04T12:53:11Z

docs/pascal_voc_user_manual.md

+To make sure that the selected dataset has been added to the project, you can run
+`datum info`


Suggested change

To make sure that the selected dataset has been added to the project, you can run

`datum info`

To make sure that the selected dataset has been added to the project, you can run

`datum info`, which will display the project and dataset information.

zhiltsov-max · 2021-05-04T13:32:18Z

docs/pascal_voc_user_manual.md

+    -o <path/to/output/project>
+
+# delete other labels from dataset
+datum trasnform -t remap_labels -- -l person:person --default delete \


I suggest to provide a concrete working example instead of a list of unrelated commands. Ensure they can be executed.

But, this example solve concrete task: preparing Pascal VOC dataset for converting to other dataset format, using related set of Datumaro operations. Isn`t it?

It is. But this example cannot be executed and if it could be, it wouldn't create the expected dataset. Please make sure these commands do what they are supposed to do.

zhiltsov-max · 2021-05-04T13:34:35Z

docs/pascal_voc_user_manual.md

+```
+
+- If you don`t need a variety of classes in Pascal VOC dataset,
+with datumaro you can group the classes for your task:


Suggested change

with datumaro you can group the classes for your task:

with Datumaro you can group the classes for your task:

The following command won't group classes, it will rename and join them. I suggest to pick a better term for this.

zhiltsov-max · 2021-05-04T13:35:33Z

docs/pascal_voc_user_manual.md

+- Also, `datum stats` includes information about how many items each class contains,
+example for Pascal VOC 2012:
+
+<details>


The formatting is broken here

zhiltsov-max · 2021-05-04T13:36:05Z

docs/pascal_voc_user_manual.md

+
+## Dataset statistics
+
+Datumaro can calculate dataset statistics, the command `datum stats` creating


This section looks too generic and unrelated to PASCAL VOC.

zhiltsov-max · 2021-05-04T13:38:31Z

docs/pascal_voc_user_manual.md

+datum diff -p ./proect2012 ./project2007
+
+Datasets have different lengths: 14974 vs 34314
+Unmatched items in the first dataset: {('00973', 'train'), ...}


It will be good to show that the final result actually matches the expected difference, which is shown in the official dataset description. Now it is just a result with no meaning.

zhiltsov-max · 2021-05-05T16:24:18Z

docs/pascal_voc_user_manual.md

+datum import -n ./project2007 -f voc -i <path/to/voc/2007>
+datum import -n ./project2012 -f voc -i <path/to/voc/2012>


Suggested change

datum import -n ./project2007 -f voc -i <path/to/voc/2007>

datum import -n ./project2012 -f voc -i <path/to/voc/2012>

datum import -o ./project2007 -f voc -i <path/to/voc/2007>

datum import -o ./project2012 -f voc -i <path/to/voc/2012>

zhiltsov-max · 2021-05-05T16:25:59Z

docs/pascal_voc_user_manual.md

+    --default delete
+```
+
+- Example 4. When choosing a dataset for research, it is often useful to find out how the


Suggested change

- Example 4. When choosing a dataset for research, it is often useful to find out how the

- Example 4. When multiple datasets are used for research, it can be useful to find out how the

zhiltsov-max · 2021-05-05T16:28:28Z

docs/pascal_voc_user_manual.md

+datum info
+```
+
+- Example 2. Pascal VOC 2007 use about 900MB disk space, you can store half as much if keep


Disk space is a faintly weak argument. I suggest to consider cross-validation scenario, where the validation subset is split into N parts, a model is trained N times on N-1 parts and validated on the rest 1 part.

zhiltsov-max · 2021-05-05T16:29:18Z

docs/pascal_voc_user_manual.md

+
+``` bash
+# create Datumaro project with Pascal VOC dataset
+datum import -n myproject -f voc -i <path/to/voc/dataset>


Using -o instead of -n might be more convenient. -n is going to be deprecated, apparently.

zhiltsov-max · 2021-05-05T16:31:16Z

docs/pascal_voc_user_manual.md

+There are few examples of using Datumaro operations to solve
+particular problems:
+
+- Example 1. Preparing Pascal VOC dataset for converting to Market-1501 dataset format.


Add an explanation, why these transformations needed for the conversion.

zhiltsov-max · 2021-05-06T14:07:09Z

docs/pascal_voc_user_manual.md

+To make sure that the selected dataset has been added to the project, you can run
+`datum info`, which will display the project and dataset information.
+
+## Export to other formats


Add a section about supported annotation types, attributes and their interpretation, import and export options, optional and mandatory files. Add a link to tests for code examples. Add an example of import results (DatasetItems and their contents). Add notes about annotation values and their relationships. Refer to https://github.com/openvinotoolkit/cvat/blob/develop/cvat/apps/dataset_manager/formats/README.md

zhiltsov-max · 2021-05-06T14:12:05Z

docs/pascal_voc_user_manual.md

+datum import -o myproject -f voc -i <path/to/voc/dataset>
+
+# convert labeled shapes into bboxes
+datum transform -t shapes_to_boxes


Suggested change

datum transform -t shapes_to_boxes

datum transform -p myproject -t shapes_to_boxes

zhiltsov-max · 2021-05-06T14:17:23Z

docs/pascal_voc_user_manual.md

+# train and validate the model ...
+```
+
+- Example 3. If you don`t need a variety of classes in Pascal VOC dataset,


Let's redo the examples this way:

how to prepare an original dataset for training

how to create a custom dataset in this format

how to parse such dataset in the code

maybe, add small examples of specific features (mask format conversion, mask color manipulation etc)

Keep them simple, but informative.

zhiltsov-max

Ok, looks almost finished.

zhiltsov-max · 2021-05-10T07:28:40Z

docs/pascal_voc_user_manual.md

+To get information about them, run
+`datum export -f <FORMAT> -- -h`
+These options are passed after double dash (`--`) in the command line.
+For example, the `voc_segmentation` format has an extra argument


This information is described twice. Better replace it with --save-images, for example.

zhiltsov-max · 2021-05-10T07:29:18Z

docs/pascal_voc_user_manual.md

+datum convert -if voc -i <path/to/voc> -f coco -o <path/to/output/dir>
+```
+
+Also it is possible using filters for converting, check


I'm not sure filtering options should be described here.

zhiltsov-max · 2021-05-10T07:29:34Z

docs/pascal_voc_user_manual.md

+There are few ways to convert Pascal VOC dataset to other dataset format:
+
+``` bash
+datum project import -f voc -i <path/to/voc>


Suggested change

datum project import -f voc -i <path/to/voc>

datum import -f voc -i <path/to/voc>

zhiltsov-max · 2021-05-10T07:31:17Z

docs/pascal_voc_user_manual.md

+
+Argument `--tasks` allow to specify tasks for export dataset,
+by default Datumaro uses all tasks.
+Argument   `--label_map` allow to define user label map, for example


Add descriptions for all the export parameters. Add information how to use colormap to change colors in an existing dataset. Add information how colormap should look like to import a dataset in the grayscale format.

zhiltsov-max · 2021-05-10T07:34:56Z

docs/pascal_voc_user_manual.md

+datum stats -p project # check statisctics.json -> repeated images
+datum transform -p project -o ndr_project -t ndr -- -w trainval -k 2500
+datum filter -p ndr_project -o trainval2500 -e '/item[subset="trainval"]'
+datum transform -p trainval2500 -o semantic_seg -t merge_instance_segments


merge_instance_segments doesn't make sense for this format.

docs/pascal_voc_user_manual.md

zhiltsov-max · 2021-05-10T07:36:30Z

docs/pascal_voc_user_manual.md

+], categories=['person', 'sky', 'water', 'lion'])
+
+dataset.transform('polygons_to_masks')
+dataset.export('./mydataset', 'voc', label_map='my_labelmap.txt')


Suggested change

dataset.export('./mydataset', 'voc', label_map='my_labelmap.txt')

dataset.export('./mydataset', format='voc', label_map='my_labelmap.txt')

zhiltsov-max · 2021-05-10T07:36:41Z

docs/pascal_voc_user_manual.md

+
+train_dataset.select(only_jumping)
+
+train_dataset.export('./jumping_label_me', 'label_me', save_images=True)


Suggested change

train_dataset.export('./jumping_label_me', 'label_me', save_images=True)

train_dataset.export('./jumping_label_me', format='label_me', save_images=True)

zhiltsov-max · 2021-05-10T07:36:54Z

docs/pascal_voc_user_manual.md

+```python
+from datumaro.components.dataset import Dataset
+
+dataset = Dataset.import_from('./VOC2012', 'voc')


Suggested change

dataset = Dataset.import_from('./VOC2012', 'voc')

dataset = Dataset.import_from('./VOC2012', format='voc')

zhiltsov-max · 2021-05-10T07:37:07Z

docs/pascal_voc_user_manual.md

+from datumaro.components.dataset import Dataset
+from datumaro.components.extractor import AnnotationType
+
+dataset = Dataset.import_from('./VOC2012', 'voc')


Suggested change

dataset = Dataset.import_from('./VOC2012', 'voc')

dataset = Dataset.import_from('./VOC2012', format='voc')

yasakova-anastasia · 2021-05-11T15:31:35Z

docs/pascal_voc_user_manual.md

+Extra options for export to Pascal VOC format:
+
+- `--save-images` allow to export dataset with saving images
+(be default `False`);


Suggested change

(be default `False`);

(by default `False`);

zhiltsov-max

Looks good for me.

zhiltsov-max · 2021-05-11T15:37:23Z

docs/pascal_voc_user_manual.md

+datum export -f voc_segmentation -- --label-map mycolormap.txt
+
+# or you can use original voc colomap:
+datum export -f voc_segmentation -- --label-map mycolormap.txt


Suggested change

datum export -f voc_segmentation -- --label-map mycolormap.txt

datum export -f voc_segmentation -- --label-map voc

zhiltsov-max · 2021-05-11T15:40:35Z

docs/pascal_voc_user_manual.md

+(be default `False`);
+
+- `--image-ext IMAGE_EXT` allow to specify image extension
+for exporting dataset (by default `.jpg`);


Suggested change

for exporting dataset (by default `.jpg`);

for exporting dataset (by default - use original or `.jpg`, if none);

zhiltsov-max · 2021-05-11T15:41:26Z

docs/pascal_voc_user_manual.md

+Extra options for export to Pascal VOC format:
+
+- `--save-images` allow to export dataset with saving images
+(be default `False`);


Suggested change

(be default `False`);

(by default `False`);

zhiltsov-max · 2021-05-11T15:41:44Z

docs/pascal_voc_user_manual.md

+and instance masks (by default `True`);
+
+- `--allow-attributes ALLOW_ATTRIBUTES` allow export of attributes
+(by default `ALLOW_ATTRIBUTES = True`);


Suggested change

(by default `ALLOW_ATTRIBUTES = True`);

(by default `True`);

zhiltsov-max · 2021-05-14T10:04:28Z

Please, update the changelog, and merge.

* Rename 'openvino' plugin to 'openvino_plugin' (#205) Co-authored-by: Jihyeon Yi <jihyeon.yi@intel.com> * Make remap labels more accurate, allow explicit label deletion, add docs, update tests (#203) * Kate/handling multiple attributes and speed up detection split (#207) * better handling multi-attributes for classification_split * handling multi-attributes better for detection * bugfix in calculating required number of images for splitting 2 correct side effect of the changes for re-id split * allow multiple subsets with arbitrary names * rename _is_number to _is_float and improve it * Fix voc to coco example (#209) * Fix export filtering * update example in readme * Fix export filename for LabelMe format (#200) * change export filename for LabelMe format * Allow simple merge for datasets with no labels * Add a more complex test on relative paths * Support escaping in attributes * update changelog Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * split unlabeled data into subsets for task-specific splitters (#211) * split unlabeled data into subsets for classification, detection. for re-id, 'not-supported' subsets for this data * Fix image ext on saving in cvat format (#214) * fix image saving in cvat format * update changelog * Label "face" for bounding boxes in Wider Face (#215) * add face label * update changelog * Adding "difficult", "truncated", "occluded" attributes when converting to Pascal VOC if they are not present (#216) * remove check for 'difficult' attribute * remove check for 'truncated' and 'occluded' attributes * update changelog * Ignore empty lines in YOLO annotations (#221) * Ignore empty lines in yolo annotations * Add type hints for image class, catch image opening errors in image.size * update changelog * Classification task in LFW dataset format (#222) * add classification * update changelog * update documentation * Add splitter for segmentation task (#223) * added segmentation_split * updated changelog * rename reidentification to reid * Support for CIFAR-10/100 format (#225) * add CIFAR dataset format * add CIFAR to documentation * update Changelog * add validation item for instance segmentation (#227) * add validation item for instance segmentation * Add panoptic and stuff COCO format (#210) * add coco stuff and panoptic formats * update CHANGELOG Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * update detection splitter algorithm from # of samples to # of instances (#235) * add documentation for validator (#233) * add documentation for validator * add validation item description (#237) * Fix converter for Pascal VOC format (#239) * User documentation for Pascal VOC format (#228) * add user documentation for Pascal VOC format * add integration tests * update changelog * Support for MNIST dataset format (#234) * add mnist format * add mnist csv format * add mnist to documentation * make formats docs folder, create COCO format documentation (#241) * Make formats docs folder, move format docs * Create COCO format documentation * Fixes in CIFAR dataset format (#243) * Add folder creation * Update changelog * Add user documentation file and integration tests for YOLO format (#246) * add user documentation file for yolo * add integraion tests * update user manual * update changelog * Add Cityscapes format (#249) * add cityscapes format * add format docs * update changelog * Fix saving attribute in WiderFace extractor (#251) * add fixes * update changelog * Fix spelling errors (#252) * Configurable Threshold CLI support (#250) * add validator cli * add configurable validator threshold * update changelog * CI. Move to GitHub actions. (#263) * Moving to GitHub Actions * Sending a coverage report if python3.6 (#264) * Rename workflows (#265) * Rename workflows * Update repo config and badge (#266) * Update PR template * Update build status badge * Fix deprecation warnings (#270) * Update RISE docs (#255) * Update rise docs * Update cli help * Pytest related changes (#248) * Tests moved to pytest. Updated CI. Updated requirements. * Updated contribution guide * Added annotations for tests * Updated tests * Added code style guide * Fix CI (#272) * Fix script call * change script call to binary call * Fix help program name, add mark_bug (#275) * Fix prog name * Add mark_bug test annotation * Fix labelmap parameter in CamVid (#262) * Fix labelmap parameter in camvid * Release 0.1.9 (dev) (#276) * Update version * Update changelog * Fix numpy conflict (#278) Co-authored-by: Emily Chun <emily.chun@intel.com> Co-authored-by: Jihyeon Yi <jihyeon.yi@intel.com> Co-authored-by: Kirill Sizov <kirill.sizov@intel.com> Co-authored-by: Anastasia Yasakova <anastasia.yasakova@intel.com> Co-authored-by: Harim Kang <harimx.kang@intel.com> Co-authored-by: Zoya Maslova <zoya.maslova@intel.com> Co-authored-by: Roman Donchenko <roman.donchenko@intel.com> Co-authored-by: Seungyoon Woo <seung.woo@intel.com> Co-authored-by: Dmitry Kruchinin <33020454+dvkruchinin@users.noreply.github.com> Co-authored-by: Slawomir Strehlke <slawomir.strehlke@intel.com>

* Rename 'openvino' plugin to 'openvino_plugin' (#205) Co-authored-by: Jihyeon Yi <jihyeon.yi@intel.com> * Make remap labels more accurate, allow explicit label deletion, add docs, update tests (#203) * Kate/handling multiple attributes and speed up detection split (#207) * better handling multi-attributes for classification_split * handling multi-attributes better for detection * bugfix in calculating required number of images for splitting 2 correct side effect of the changes for re-id split * allow multiple subsets with arbitrary names * rename _is_number to _is_float and improve it * Fix voc to coco example (#209) * Fix export filtering * update example in readme * Fix export filename for LabelMe format (#200) * change export filename for LabelMe format * Allow simple merge for datasets with no labels * Add a more complex test on relative paths * Support escaping in attributes * update changelog Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * split unlabeled data into subsets for task-specific splitters (#211) * split unlabeled data into subsets for classification, detection. for re-id, 'not-supported' subsets for this data * Fix image ext on saving in cvat format (#214) * fix image saving in cvat format * update changelog * Label "face" for bounding boxes in Wider Face (#215) * add face label * update changelog * Adding "difficult", "truncated", "occluded" attributes when converting to Pascal VOC if they are not present (#216) * remove check for 'difficult' attribute * remove check for 'truncated' and 'occluded' attributes * update changelog * Ignore empty lines in YOLO annotations (#221) * Ignore empty lines in yolo annotations * Add type hints for image class, catch image opening errors in image.size * update changelog * Classification task in LFW dataset format (#222) * add classification * update changelog * update documentation * Add splitter for segmentation task (#223) * added segmentation_split * updated changelog * rename reidentification to reid * Support for CIFAR-10/100 format (#225) * add CIFAR dataset format * add CIFAR to documentation * update Changelog * add validation item for instance segmentation (#227) * add validation item for instance segmentation * Add panoptic and stuff COCO format (#210) * add coco stuff and panoptic formats * update CHANGELOG Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * update detection splitter algorithm from # of samples to # of instances (#235) * add documentation for validator (#233) * add documentation for validator * add validation item description (#237) * Fix converter for Pascal VOC format (#239) * User documentation for Pascal VOC format (#228) * add user documentation for Pascal VOC format * add integration tests * update changelog * Support for MNIST dataset format (#234) * add mnist format * add mnist csv format * add mnist to documentation * make formats docs folder, create COCO format documentation (#241) * Make formats docs folder, move format docs * Create COCO format documentation * Fixes in CIFAR dataset format (#243) * Add folder creation * Update changelog * Add user documentation file and integration tests for YOLO format (#246) * add user documentation file for yolo * add integraion tests * update user manual * update changelog * Add Cityscapes format (#249) * add cityscapes format * add format docs * update changelog * Fix saving attribute in WiderFace extractor (#251) * add fixes * update changelog * Fix spelling errors (#252) * Configurable Threshold CLI support (#250) * add validator cli * add configurable validator threshold * update changelog * CI. Move to GitHub actions. (#263) * Moving to GitHub Actions * Sending a coverage report if python3.6 (#264) * Rename workflows (#265) * Rename workflows * Update repo config and badge (#266) * Update PR template * Update build status badge * Fix deprecation warnings (#270) * Update RISE docs (#255) * Update rise docs * Update cli help * Pytest related changes (#248) * Tests moved to pytest. Updated CI. Updated requirements. * Updated contribution guide * Added annotations for tests * Updated tests * Added code style guide * Fix CI (#272) * Fix script call * change script call to binary call * Fix help program name, add mark_bug (#275) * Fix prog name * Add mark_bug test annotation * Fix labelmap parameter in CamVid (#262) * Fix labelmap parameter in camvid * Release 0.1.9 (dev) (#276) * Update version * Update changelog * Fix numpy conflict (#278) * Add changelog stub (#279) * tests/requirements.py: remove the test_wrapper functions (#285) * Subformat importers for VOC and COCO (#281) * Document find_sources * Add VOC subformat importers * Add coco subformat importers * Fix LFW * Reduce voc detect dataset cases * Reorganize coco tests, add subformat tests * Fix default subset handling in Dataset * Fix getting subset * Fix coco tests * Fix voc tests * Update changelog * Add image zip format (#273) * add tests * add image_zip format * update changelog Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * Add KITTI detection and segmentation formats (#282) * Add KITTI detection and segmentation formats * Remove unused import * Add KITTI user manual Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * Fix loading file and image processing in CIFAR (#284) * Fix image layout and encoding problems * Update Changelog Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * CLI tests for convert command for VOC dataset (#286) * Add tests for convert command * Convert most enum definitions from the functional style to the class style (#290) * yolo format documentation update (#295) * add info about coordinates in yolo format doc * Fix merged dataset item filtering (#258) * Add tests * Fix xpathfilter transform * Update changelog * Sms/pytest marking cityscapes and zip (#298) * Updated pytest marking for cityscapes and imagezip. * Introduce Validator plugin type (#299) * Introduce Validator plugin type * Fix validator definitions (#303) * update changelog * Fixes in validator definitions * Update validator cli * Make TF availability check optional (#305) * Make tf availability check optional * update changelog * Update pylint (#304) * Add import order check in pylint * Fix some linter problems * Remove warning suppression comments * Add lazy loading for builtin plugins (#306) * Refactor env code * Load builtin plugins lazily * update changelog * Update transforms handling in Dataset (#297) * Update builtin transforms * Optimize dataset length computation when no source * Add filter test * Fix transforms affecting categories * Optimize categories transforms * Update filters * fix imports * Avoid using default docstrings in plugins * Fix patch saving in VOC, add keep_empty export parameter * Fix flush_changes * Fix removed images and subsets in dataset patch * Update changelog * Update voc doc * Skip item transform base class in plugins * Readable COCO and datumaro format for CJK (#307) * Do not force ASCII in COCO and Datumaro JSONs for readable CJK * Add tests * Use utf-8 encoding for writing Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * Force utf-8 everywhere (#309) * Fix in ImageNet_txt (#302) * Add extensions for images to annotation file * Remove image search in extractor * Update changelog Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * Reduce duplication of dependency information (#308) * Move requirements from setup.py to requirements-base.txt * Add whitespace error checking to GitHub Actions (#311) * Fix whitespace errors As detected with `git diff --check`. * Add a job to check for whitespace errors I called it "lint" so that other checks could be added to it later. * Bump copyright years in changed files * Add initial support for the Open Images dataset (#291) * Support reading or Labels in Open Images (v4, v5, v6) * Add tests for the Open Images extractor/importer * Add Open Images documentation * Update changelog * Fix tensorboardX dependency (#318) * Fixing remark-lint issues. Adding remark-linter check. (#321) * Fix remark-lint issues. * Align continuation lines with the first line. Apply comments * Added remark check * Add an upper bound on the Pillow dependency to work around a regression in 8.3 (#323) * open_images_user_manual.md: fix image description file URLs I accidentally swapped the URLs for test and validation sets. * Fix COCO Panoptic (#319) * add test * Fix integer overflow in bgr2index * Fix pylint issues. Added pylint checking. (#322) * Added pylint job for CI * Rework pip install * Fixed remaining pylint warnings Co-authored-by: Andrey Zhavoronkov <andrey.zhavoronkov@intel.com> * Open Images: add writing support (#315) * open_images_user_manual.md: fix image description file URLs * open_images_format: add conversion support * open_images_format: add support for images in subdirectories * open_images_format: add tests for writing support * open_images_format: add documentation for the writing support * Update the changelog entry for the Open Images support * Add python bandit checks. (#316) * Add bandit dependency * Add bandit checks on CI * Disable some warnings Co-authored-by: Andrey Zhavoronkov <andrey.zhavoronkov@intel.com> Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * Remove Pylint unused-import warning suppressions (#326) * Remove Pylint unused-import warning suppressions * Add a job to check import formatting using isort (#333) * Reformat all imports using isort * Implement a workflow for checking import formatting based on isort * Reformat the enabled checker list in .pylintrc (#335) Put each code on its own line and add a comment with its symbolic name. That makes the list more understandable and easier to edit. * Merge all linting jobs into one workflow file (#331) Doing it this way means that on GitHub's Checks page, all jobs are displayed under one "Linter" category, instead of multiple indistinguishable "Linter" categories with one job each. Move the whitespace checking job into the Linter workflow as well, since that's where it logically belongs. I also took the opportunity to slightly rename the jobs in order to spell the linter names correctly. * Fix cuboids / 3d / M6 (#320) * CVAT-3D Milestone-6: Added Supervisely Point Cloud and KITTI Raw 3D formats * Added Cuboid3d annotations * Added docs for new formats Co-authored-by: cdp <cdp123> Co-authored-by: Jayraj <jayrajsolanki96@gmail.com> Co-authored-by: Roman Donchenko <roman.donchenko@intel.com> * Clean up .pylintrc (#340) * Clean up the list of messages in .pylintrc * Remove obsolete Pylint options * .pylintrc: move the disable setting and its documentation together * Remove the commented-out setting. * Revert "Add an upper bound on the Pillow dependency to work around a regression in 8.3 (#323)" (#341) The regression was fixed in 8.3.1. This reverts commit 9a85616. * Enable pylint checkers that find invalid escape sequences (#344) Fix the issues that they found. * Factor out the images.meta loading code from YoloExtractor (#343) * Factor out the images.meta loading code from YoloExtractor It looks like the same thing will be needed for Open Images, so I'm moving it to a common module. * Rework image.meta parsing code to use shell syntax This allows comments and improves extensibility. * Support for CIFAR-100 (#301) * Add support for CIFAR-100 * Update Changelog * Update user_manual.md * Add notes about differences in formats * Fix importing for VGG Face 2 (#345) * correct asset according the original vgg_face2 dataset * fix importing of the original dataset Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * Dataset caching fixes (#351) * Fix importing arbitrary file names in COCO subformats * Optimize subset iteration in a simple scenario * Fix subset iteration in dataset with transforms * Cuboid 3D for Datumaro format (#349) * Support cuboid_3d and point cloud in datumaro format * Add cuboid_3d and point cloud tests in datumaro format * Add image size type conversions Co-authored-by: Maxim Zhiltsov <maxim.zhiltsov@intel.com> * Add e2e tests for cuboids (#353) * Add attr name check in kitti raw * Add sly pcd e2e test * Rename "object" attribute to "track_id" in sly point cloud * Add kitti raw e2e test * Update kitti raw example * update changelog * Release 0.1.10 (dev) (#354) * Update changelog * Add cifar security notice * Update version Co-authored-by: Emily Chun <emily.chun@intel.com> Co-authored-by: Jihyeon Yi <jihyeon.yi@intel.com> Co-authored-by: Kirill Sizov <kirill.sizov@intel.com> Co-authored-by: Anastasia Yasakova <anastasia.yasakova@intel.com> Co-authored-by: Harim Kang <harimx.kang@intel.com> Co-authored-by: Zoya Maslova <zoya.maslova@intel.com> Co-authored-by: Roman Donchenko <roman.donchenko@intel.com> Co-authored-by: Seungyoon Woo <seung.woo@intel.com> Co-authored-by: Dmitry Kruchinin <33020454+dvkruchinin@users.noreply.github.com> Co-authored-by: Slawomir Strehlke <slawomir.strehlke@intel.com> Co-authored-by: Jaesun Park <diligensloth@gmail.com> Co-authored-by: Andrey Zhavoronkov <andrey.zhavoronkov@intel.com> Co-authored-by: Jayraj <jayrajsolanki96@gmail.com>

add user documentation for Pascal VOC format

016ee5c

sizov-kirill requested a review from zhiltsov-max April 28, 2021 15:10

zhiltsov-max linked an issue Apr 29, 2021 that may be closed by this pull request

User documentation for Pascal VOC format #224

Closed

zhiltsov-max suggested changes Apr 29, 2021

View reviewed changes

kirill.sizov added 5 commits April 29, 2021 17:26

change mask names

4268377

change section name

1713161

change datumaro functionality section

a727f8c

make md file more readable

31f1a10

update contents

c6aea52

sizov-kirill requested a review from zhiltsov-max April 30, 2021 09:56

zhiltsov-max suggested changes May 4, 2021

View reviewed changes

kirill.sizov added 6 commits May 5, 2021 11:00

make some requested changes

716ed22

fix example 1

48c5882

fix examples 3,4

c33700e

remove stats section

e4ded8b

fix import dataset in examples

c4ded0c

fix example 1

583c374

zhiltsov-max reviewed May 5, 2021

View reviewed changes

kirill.sizov added 3 commits May 6, 2021 09:55

write -o instead -n for import commands

05e144e

fix example 4

66af6e8

add explanation in example 1

2676132

zhiltsov-max reviewed May 6, 2021

View reviewed changes

kirill.sizov added 2 commits May 6, 2021 17:40

add some sections, remove two examples

4457252

remove duplicate phrases

da44664

kirill.sizov added 2 commits May 7, 2021 10:56

add 'how to prepare an original datasetfor training'

9d858f0

add 'how create to create custom dataset'

16d6cea

sizov-kirill requested a review from zhiltsov-max May 7, 2021 09:11

zhiltsov-max suggested changes May 10, 2021

View reviewed changes

kirill.sizov added 3 commits May 11, 2021 10:10

make suggested changes

1fdfb8d

add information about all extra arguments

d22855f

add refer to tests

cac8f23

sizov-kirill requested a review from yasakova-anastasia May 11, 2021 14:28

yasakova-anastasia previously approved these changes May 11, 2021

View reviewed changes

zhiltsov-max previously approved these changes May 11, 2021

View reviewed changes

minor changes

48b2e18

sizov-kirill dismissed stale reviews from zhiltsov-max and yasakova-anastasia via 48b2e18 May 12, 2021 05:46

yasakova-anastasia previously approved these changes May 12, 2021

View reviewed changes

add integration tests

7e55fb6

sizov-kirill dismissed yasakova-anastasia’s stale review via 7e55fb6 May 14, 2021 08:35

sizov-kirill requested a review from zhiltsov-max May 14, 2021 08:35

Merge branch 'develop' into sk/pascal-voc-user-documentation

095360a

zhiltsov-max previously approved these changes May 14, 2021

View reviewed changes

update changelog

d63ae62

sizov-kirill dismissed zhiltsov-max’s stale review via d63ae62 May 14, 2021 10:22

Add link in user manual

5adb6a9

zhiltsov-max merged commit ef003ca into develop May 14, 2021

zhiltsov-max deleted the sk/pascal-voc-user-documentation branch May 27, 2021 11:00

	There are two ways to create datumaro project and add Pascal VOC dataset to it
	There are two ways to create datumaro project and add Pascal VOC dataset to it:

	Also it is possible to add Pascal VOC dataset and specify task for it, for example:
	Is is also possible to import specific tasks of PASCAL VOC dataset instead of the whole dataset, for example:

-In addition to `voc_detection`, Datumaro supports
-`voc_action` (for action classification task),
-`voc_classification`,
-`voc_segmentation`,
-`voc_layout` (for person layout task).
+Datumaro supports the following PASCAL VOC tasks:
+- Image classification (`voc_classification`)
+- Object detection (`voc_detection`)
+- Action classification (`voc_action`)
+- Class and instance segmentation (`voc_segmentation`)
+- Person layout detection (`voc_layout`)

		To make sure that the selected dataset has been added to the project, you can run
		`datum info`

	with datumaro you can group the classes for your task:
	with Datumaro you can group the classes for your task:


		## Dataset statistics

		Datumaro can calculate dataset statistics, the command `datum stats` creating

		datum import -n ./project2007 -f voc -i <path/to/voc/2007>
		datum import -n ./project2012 -f voc -i <path/to/voc/2012>

	- Example 4. When choosing a dataset for research, it is often useful to find out how the
	- Example 4. When multiple datasets are used for research, it can be useful to find out how the

	datum transform -t shapes_to_boxes
	datum transform -p myproject -t shapes_to_boxes

	datum project import -f voc -i <path/to/voc>
	datum import -f voc -i <path/to/voc>

	dataset.export('./mydataset', 'voc', label_map='my_labelmap.txt')
	dataset.export('./mydataset', format='voc', label_map='my_labelmap.txt')


		train_dataset.select(only_jumping)

		train_dataset.export('./jumping_label_me', 'label_me', save_images=True)

	dataset = Dataset.import_from('./VOC2012', 'voc')
	dataset = Dataset.import_from('./VOC2012', format='voc')

	datum export -f voc_segmentation -- --label-map mycolormap.txt
	datum export -f voc_segmentation -- --label-map voc

	for exporting dataset (by default `.jpg`);
	for exporting dataset (by default - use original or `.jpg`, if none);

User documentation for Pascal VOC format #228

User documentation for Pascal VOC format #228

Conversation

sizov-kirill commented Apr 28, 2021

Summary

How to test

Checklist

License

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max May 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max May 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max May 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max May 6, 2021 • edited Loading

Choose a reason for hiding this comment

zhiltsov-max left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max commented May 14, 2021

zhiltsov-max May 4, 2021 •

edited

Loading

zhiltsov-max May 5, 2021 •

edited

Loading

zhiltsov-max May 6, 2021 •

edited

Loading

zhiltsov-max May 6, 2021 •

edited

Loading