Skip to content

Releases: openvinotoolkit/datumaro

Release 1.1.0rc1

17 Mar 08:40
Compare
Choose a tag to compare
Release 1.1.0rc1 Pre-release
Pre-release

What's Changed - Brief Version

Added

  • Add with_subset_dirs decorator (Add ImagenetWithSubsetDirsImporter)
    (#816)
  • Add CommonSemanticSegmentationWithSubsetDirsImporter
    (#826)
  • Add DatumaroBinary format
    (#828, #829, #830, #831)
  • Add Searcher CLI documentation
    (#838)
  • Add version to dataset exported as datumaro format
    (#842)
  • Add Ava action data format support
    (#847)
  • Add Shift Analyzer (both covariate and label shifts)
    (#855)
  • Add YOLO Loose format
    (#856)
  • Add Ultralytics YOLO format
    (#859)

Changed

  • Refactor Datumaro format code and test code
    (#824)

Fixed

  • Fix image filenames and anomaly mask appearance in MVTec exporter
    (#835)
  • Fix CIFAR10 and 100 detect function
    (#836)
  • Fix celeba and align_celeba detect function
    (#837)
  • Choose the top priority detect format for all directory depths
    (#839)
  • Fix MVTec format detect function
    (#843)
  • Fix wrong __len__() of Subset when the item is removed
    (#854)
  • Fix mask visualization bug
    (#860)

What's Changed - Full Version

New Contributors

Full Changelog: v1.0.0...v1.1.0rc1

Release v1.0.0

24 Feb 09:15
b0d58a0
Compare
Choose a tag to compare

Added

  • Add Data Explorer (#773)
  • Add Ellipse annotation type (#807)
  • Add MVTec anomaly data support (#810)

Changed

  • Refactor existing tests (#803)
  • Raise ImportError on importing malformed COCO directory (#812)
  • Remove the duplicated and cyclical category context in documentation (#822)

Fixed

Release v0.5.0

31 Jan 01:36
10543df
Compare
Choose a tag to compare

Added

  • Add Tile transformation (#790)
  • Add Video keyframe extraction (#791)
  • Add TileTransform documentation and Jupyter notebook example (#794)
  • Add MergeTile transformation (#796)

Changed

  • Improved mask_to_rle performance (#770)

Deprecated

  • N/A

Removed

  • N/A

Fixed

  • Fix auto-documentation for the data_format plugins (#793)

Security

  • Add security.md file for the SDL (#798)

Release v0.4.0.1

13 Dec 05:36
3df321b
Compare
Choose a tag to compare

Added

  • Support for exclusive of labels with LabelGroup (#742)
  • Jupyter samples
    • Introducing how to merge datasets (#738)
    • Introducing how to visualize dataset (#747)
    • Introducing how to filter dataset (#748)
    • Introducing how to transform dataset (#759)
  • Visualization Python API
    • Bbox feature (#744)
    • Label, Points, Polygon, PolyLine, and Caption visualization features (#746)
    • Mask, SuperResolution, Depth visualization features (#747)
  • Documentation for Python API (#753)
    • dataset handler, visualizer, filter descriptions (#761)
  • Support for exporting as CVAT video format (#757)
  • Jupyter notebook example rendering to documentation (#758)
  • An interface to manipulate 'infos' to store the dataset meta-info (#767)
  • 'bbox' annotation when importing a COCO dataset (#772)

Changed

  • Wrap title text according to its plot width (#769)
  • Get list of subsets and support only Image media type in visualizer (#768)

Deprecated

  • N/A

Removed

  • N/A

Fixed

  • Correcting static type checking (#743)
  • Fixing a VOC dataset export error when a label contains 'space' (#771)

Security

  • N/A

Release v0.3.1

07 Sep 05:27
f597574
Compare
Choose a tag to compare

Added

  • Support for custom media types, new PointCloud media type, DatasetItem.media and .media_as(type) members (#539)
  • [API] A way to request dataset and extractor media type with media_type (#539)
  • BraTS format (import-only) (.npy and .nii.gz), new MultiframeImage media type (#628)
  • Common Semantic Segmentation dataset format (import-only) (#685)
  • An option to disable data/ prefix inclusion in YOLO export (#689)
  • New command describe-downloads to print information about downloadable datasets (#678)
  • Detection for Cityscapes format (#680)
  • Maximum recursion --depth parameter for detect-dataset CLI command (#680)
  • An option to save a single subset in the download command (#697)
  • Common Super Resolution dataset format (import-only) (#700)
  • Kinetics 400/600/700 dataset format (import-only) (#706)
  • NYU Depth Dataset V2 format (import-only) (#712)

Changed

  • env.detect_dataset() now returns a list of detected formats at all recursion levels instead of just the lowest one (#680)
  • Open Images: allowed to store annotations file in root path as well (#680)
  • Improved parsing error messages in COCO, VOC and YOLO formats (#684, #686, #687)
  • YOLO format now supports almost any subset names, except backup, names and classes (instead of just train and valid). The reserved names now raise an error on exporting. (#688)

Deprecated

  • --save-images is replaced with --save-media in CLI and converter API (#539)
  • [API] image, point_cloud and related_images of DatasetItem are replaced with media and media_as(type) members and c-tor parameters (#539)

Removed

  • N/A

Fixed

  • Detection for LFW format (#680)
  • Adding depth value of image when dataset is exported in VOC format (#726)
  • Adding to handle the numerical labels in task chains properly (#726)
  • Fixing the issue that annotations inside another annotation (polygon) are duplicated during import for VOC format (#726)

Security

  • N/A

Release v0.3: Video Support

21 Feb 11:10
7e0131d
Compare
Choose a tag to compare

Added

  • Ability to import a video as frames with the video_frames format and to split a video into frames with the datum util split_video command (#555)
  • --subset parameter in the image_dir format (#555)
  • MediaManager API to control loaded media resources at runtime (#555)
  • Command to detect the format of a dataset (#576)
  • More comfortable access to library API via import datumaro (#630)
  • CLI command-like free functions (export, transform, ...) (#630)
  • Reading specific annotation files for train dataset in Cityscapes (#632)
  • Random sampling transforms (random_sampler, label_random_sampler) to create smaller datasets from bigger ones (#636, #640)
  • API to report dataset import and export progress; API to report dataset import and export errors and take action (skip, fail)
    (supported in COCO, VOC and YOLO formats) (#650)
  • Support for downloading the ImageNetV2 and COCO datasets (#653, #659)
  • A way for formats to signal that they don't support detection (#665)
  • Removal transforms to remove items/annoations/attributes from dataset (remove_items, remove_annotations, remove_attributes) (#670)

Changed

  • Allowed direct file paths in datum import. Such sources are imported like when the rpath parameter is specified, however, only the selected path is copied into the project (#555)
  • Improved stats performance, added new filtering parameters, image stats (unique, repeated) moved to the dataset section,
    removed mean and std from the dataset section (#621)
  • Allowed Image creation from just size info (#634)
  • Added image search in VOC XML-based subformats (#634)
  • Added image path equality checks in simple merge, when applicable (#634)
  • Supported saving box attributes when downloading the TFDS version of VOC (#668)
  • Switched to a pyproject.toml-based build (#671)

Deprecated

  • TBD

Removed

  • Official support of Python 3.6 (due to it's EOL) (#617)
  • Backward compatibility annotation symbols in components.extractor (#630)

Fixed

  • Prohibited calling add, import and export commands without a project (#555)
  • Calling make_dataset on empty project tree now produces the error properly (#555)
  • Saving (overwriting) a dataset in a project when rpath is used (#613)
  • Output image extension preserving in the Resize transform (#606)
  • Memory overuse in the Resize transform (#607)
  • Invalid image pixels produced by the Resize transform (#618)
  • Numeric warnings that sometimes occurred in stats command (e.g. #607) (#621)
  • Added missing item attribute merging in simple merge (#634)
  • Inability to disambiguate VOC from LabelMe in some cases (#658)

Security

  • TBD

Release v0.2.3: Public dataset downloading

28 Jan 09:13
b0fa100
Compare
Choose a tag to compare

Added

  • Command to download public datasets (#582)
  • Extension autodetection in ByteImage (#595)
  • MPII Human Pose Dataset (import-only) (.mat and .json) (#584)
  • MARS format (import-only) (#585)

Changed

  • smooth_line from datumaro.util.annotation_util - the function is renamed to approximate_line and has updated interface (#592)
  • The pycocotools dependency lower bound is raised to 2.0.4 (#449)

Deprecated

  • Python 3.6 support

Fixed

  • Fails in multimerge when lines are not approximated and when there are no label categories (#592)
  • Cannot convert LabelMe dataset, that has no subsets (#600)

Release v0.2.2

24 Dec 09:24
b0fd519
Compare
Choose a tag to compare

Added

  • Video reading API (#521)
  • Python API documentation site (#526)
  • Mapillary Vistas dataset format (Import-only) (#537)
  • Datumaro can now be installed on Windows on Python 3.9 (#547)
  • SYNTHIA dataset format (Import-only) (#532)
  • Support of score attribute in KITTI detection (#571)
  • Support for Accuracy Checker dataset meta files in formats (#553, #569, #575)
  • VoTT dataset format (Import-only) (#573)
  • Image resizing transform (#581)

Changed

  • The following formats can now be detected unambiguously: ade20k2017, ade20k2020, camvid, coco, cvat, datumaro, icdar_text_localization, icdar_text_segmentation, icdar_word_recognition, imagenet_txt, kitti_raw, label_me, lfw, mot_seq, open_images, vgg_face2, voc, widerface, yolo (#531, #536, #550, #557, #558)
  • Allowed export options in the datum merge command (#545)

Deprecated

  • Using Image, ByteImage from datumaro.util.image - these classes are moved to datumaro.components.media (#538)

Removed

  • Equality comparison support between datumaro.components.media.Image and numpy.ndarray (#568)

Fixed

  • Bug #560: import issue with MOT dataset when using seqinfo.ini file (#564)
  • Empty lines in VOC subset lists are now ignored (#587)

Release v0.2.1

16 Nov 11:56
6f21792
Compare
Choose a tag to compare

A bugfix release. Relaxes some requirements on formats.

Added

  • Import for CelebA dataset format (#484)

Changed

  • File people.txt became optional in LFW (#509)
  • File image_ids_and_rotation.csv became optional Open Images (#509)
  • Allowed underscores (_) in subset names in COCO (#509)
  • Allowed annotation files with arbitrary names in COCO (#509)
  • The icdar_text_localization format is no longer detected in every directory (#531)
  • Updated pycocotools version to 2.0.2 (#534)

Fixed

  • Unhandled exception when a file is specified as the source for a COCO or MOTS dataset (#530)

Release v0.2: Dataset versioning

14 Oct 15:42
7e8615c
Compare
Choose a tag to compare

This release adds dataset versioning capabilities and significantly changes the command line.
It also improves CLI and API documentation, and extends the transformations library.

A Datumaro project can contain and manage multiple datasets instead of a single one.
CLI operations can be applied to the whole project, or to separate datasets.
Datasets are now modified inplace, by default. The project layout is updated. To update
an old project to the new version, use datum project migrate.

Added

  • A new installation target: pip install datumaro[default], which should be
    used in most cases by default. The simple datumaro is supposed for library users (#238)
  • Dataset and project versioning capabilities (Git-like) (#238)
  • [CLI] "dataset revpath" concept in CLI, allowing to pass a dataset path with
    the dataset format in diff, merge, explain and info CLI commands (#238)
  • [CLI] import, remove, commit, checkout, log, status, info CLI commands (#238)
  • [CLI] patch CLI command to patch one dataset from another (#401)
  • [CLI, API] ProjectLabels transform to change dataset labels for merging etc. (#401, #478)
  • [API] Type annotations and docs for Annotation classes (#493)
  • [formats] Support for custom labels in the KITTI detection format (#481)
  • [formats] Coco*Extractor classes now have an option to preserve label IDs from the
    original annotation file (#453)
  • [formats] Options to control label loading behavior in imagenet_txt import (#434, #489)
  • Data collection by telemetry. Check this notice about the details (#495)

Changed

  • A project can contain and manage multiple datasets instead of a single one.
    CLI operations can be applied to the whole project, or to separate datasets.
    Datasets are modified inplace, by default (#328)
  • [CLI] The import command copies datasets by default. Use add to add datasets without copying (#508)
  • [CLI] Projects use new file layout, incompatible with old projects.
    An old project can be updated with datum project migrate (#238)
  • [CLI] diff and ediff are joined into a single diff CLI command (#238)
  • [CLI] CLI help for builtin plugins doesn't require project (#328)
  • [API] The Project class from datumaro.components is changed completely (#238)
  • [API] Inheriting CliPlugin is not required in plugin classes (#238)
  • [API] Importers do not create Projects anymore and just return a list of
    extractor configurations (#238)
  • [API] Annotation-related classes were moved into a new module,
    datumaro.components.annotation (#439)
  • [API] Rollback utilities replaced with Scope utilities (#444)

Removed

  • [CLI] project merge CLI command (#238)
  • Support for project hierarchies. A project cannot be a source anymore (#238)
  • A project cannot have independent internal dataset anymore. All the project
    data must be stored in the project data sources (#238)
  • datumaro_project format (#238)
  • [API] Unused path field of DatasetItem (#455)

Fixed

  • Deprecation warning in open_images_format.py (#440)
  • lazy_image returning unrelated data sometimes (#409)
  • Invalid call to pycocotools.mask.iou (#450)
  • Importing of Open Images datasets without image data (#463)
  • Return value type in Dataset.is_modified (#401)
  • Incorrect remapping of secondary categories in RemapLabels (#401)
  • VOC dataset patching for classification and segmentation tasks (#478)
  • Exported mask label ids in KITTI segmentation (#481)
  • Missing label for Points read in the LFW format (#494)