Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI update from diff to compare & add TableComparator #1012

Merged
merged 43 commits into from
Jun 7, 2023

Conversation

JihwanEom
Copy link
Contributor

@JihwanEom JihwanEom commented May 23, 2023

Summary

  • CLI update from diff to compare
  • Add TableComparator functionality as new method
  • Refactor existing features for HLOps structure
  • Fix broken links in project.rst

How to test

datum compare tests/assets/cityscapes_dataset/train_dataset/ tests/assets/ade20k2020_dataset/

# Save output json and testable
datum compare tests/assets/cityscapes_dataset/train_dataset/ tests/assets/ade20k2020_dataset/ --output-dir compare_test

Output preview

$ datum compare tests/assets/cityscapes_dataset/train_dataset/ tests/assets/ade20k2020_dataset/
High-level comparison:
+--------------------------+-------------------------------------+------------------------+
| Field                    | First                               | Second                 |
+==========================+=====================================+========================+
| Format                   | cityscapes                          | ade20k2020             |
+--------------------------+-------------------------------------+------------------------+
| Number of classes        | 20                                  | 4                      |
+--------------------------+-------------------------------------+------------------------+
| Common classes           | car, person                         | car, person            |
+--------------------------+-------------------------------------+------------------------+
| Classes                  | background, bicycle, building, bus, | car, door, person, rim |
|                          | car, fence, motorcycle, person,     |                        |
|                          | pole, rider, road, sidewalk, sky,   |                        |
|                          | terrain, trafficlight, trafficsign, |                        |
|                          | train, truck, vegetation, wall      |                        |
+--------------------------+-------------------------------------+------------------------+
| Images count             | 4                                   | 3                      |
+--------------------------+-------------------------------------+------------------------+
| Unique images count      | 1                                   | 1                      |
+--------------------------+-------------------------------------+------------------------+
| Repeated images count    | 1                                   | 1                      |
+--------------------------+-------------------------------------+------------------------+
| Annotations count        | 8                                   | 22                     |
+--------------------------+-------------------------------------+------------------------+
| Unannotated images count | 0                                   | 0                      |
+--------------------------+-------------------------------------+------------------------+

Mid-level comparison:
+-------------------------------------+--------------------------+--------------------------+
| Field                               | First                    | Second                   |
+=====================================+==========================+==========================+
| test - Image Mean                   | 1.00,   1.00,   1.00     |                          |
+-------------------------------------+--------------------------+--------------------------+
| test - Image Std                    | 0.00,   0.00,   0.00     |                          |
+-------------------------------------+--------------------------+--------------------------+
| train - Image Mean                  | 1.00,   1.00,   1.00     |                          |
+-------------------------------------+--------------------------+--------------------------+
| train - Image Std                   | 0.00,   0.00,   0.00     |                          |
+-------------------------------------+--------------------------+--------------------------+
| val - Image Mean                    | 1.00,   1.00,   1.00     |                          |
+-------------------------------------+--------------------------+--------------------------+
| val - Image Std                     | 0.00,   0.00,   0.00     |                          |
+-------------------------------------+--------------------------+--------------------------+
| dataset - Image Mean                |                          | 1.00,   1.00,   1.00     |
+-------------------------------------+--------------------------+--------------------------+
| dataset - Image Std                 |                          | 0.00,   0.00,   0.00     |
+-------------------------------------+--------------------------+--------------------------+
| dataset_with_meta_file - Image Mean |                          | 1.00,   1.00,   1.00     |
+-------------------------------------+--------------------------+--------------------------+
| dataset_with_meta_file - Image Std  |                          | 0.00,   0.00,   0.00     |
+-------------------------------------+--------------------------+--------------------------+
| Label - background                  | imgs: 4, percent: 0.5000 |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - bicycle                     |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - building                    |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - bus                         |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - car                         |                          | imgs: 6, percent: 0.2727 |
+-------------------------------------+--------------------------+--------------------------+
| Label - fence                       |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - motorcycle                  |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - person                      | imgs: 1, percent: 0.1250 | imgs: 8, percent: 0.3636 |
+-------------------------------------+--------------------------+--------------------------+
| Label - pole                        |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - rider                       |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - road                        |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - sidewalk                    |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - sky                         |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - terrain                     |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - trafficlight                |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - trafficsign                 |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - train                       | imgs: 1, percent: 0.1250 |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - truck                       | imgs: 1, percent: 0.1250 |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - vegetation                  |                          |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - wall                        | imgs: 1, percent: 0.1250 |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - door                        |                          | imgs: 6, percent: 0.2727 |
+-------------------------------------+--------------------------+--------------------------+
| Label - rim                         |                          | imgs: 2, percent: 0.0909 |
+-------------------------------------+--------------------------+--------------------------+

Low-level comparison:
+-----------------+---------+
| Field           |   Value |
+=================+=========+
| Covariate shift |   0     |
+-----------------+---------+
| Label shift     |   0.999 |
+-----------------+---------+

Test coverage

Name                                                                                   Stmts   Miss Branch BrPart  Cover
------------------------------------------------------------------------------------------------------------------------
datumaro/plugins/comparator.py                                                           321     37    120      8    87%

How to test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added the description of my changes into CHANGELOG.​
  • I have updated the documentation accordingly

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

@JihwanEom JihwanEom changed the base branch from develop to releases/1.3.0 May 23, 2023 06:49
datumaro/cli/commands/compare.py Outdated Show resolved Hide resolved
@JihwanEom JihwanEom changed the base branch from releases/1.3.0 to develop May 24, 2023 05:46
@JihwanEom JihwanEom changed the title Add comparator CLI Update from diff to compare & add TableComparator May 25, 2023
@JihwanEom JihwanEom changed the title CLI Update from diff to compare & add TableComparator CLI update from diff to compare & add TableComparator May 25, 2023
@JihwanEom JihwanEom changed the title CLI update from diff to compare & add TableComparator CLI update diff => compare & add TableComparator May 25, 2023
@JihwanEom JihwanEom changed the title CLI update diff => compare & add TableComparator CLI update from diff to compare & add TableComparator May 25, 2023
@JihwanEom
Copy link
Contributor Author

Could you clarify the differences between DistanceComparator, EqualityComparator, and TableComparator?

The DistanceComparator supports item-wise comparison for the label, bbox, polygon, and mask annotation types, and it saves a confusion matrix for each type.
The EqualityComparator is used to compare whether the annotations in two given datasets are exactly the same.
The TableComparator is a new feature proposed in this PR. It provides a high-level view of the datasets, not on an item-wise basis, but on the overall perspective. TableComparator represents the dataset's format, number of classes, common classes, and image count. In the mid-level view, it presents image and annotation statistics for each subset. In the low-level comparison, it utilizes a shift analyzer to represent covariate shift and label shift.

Below are examples of the output for each comparator.

datum compare tests/assets/ade20k2017_dataset/ tests/assets/ade20k2020_dataset/ --method distance --output-dir output/distance
datum compare tests/assets/ade20k2017_dataset/ tests/assets/ade20k2020_dataset/ --method equality --output-dir output/equality 
datum compare tests/assets/ade20k2017_dataset/ tests/assets/ade20k2020_dataset/ --method table --output-dir output/table
image

DistanceComparator

mask_confusion.png
image
polygon_confusion.png
image

EqualityComparator

equality_compare.json

{
  "mismatches": [],
  "a_extra_items": [],
  "b_extra_items": [],
  "errors": [
    {
      "type": "labels",
      "message": "Lists differ: [Labe[22 chars]ame='sky', parent='', attributes=set()), Label[203 chars]t())] != [Labe[22 chars]ame='car', parent='', attributes=set()), Label[194 chars]t())]\n\nFirst differing element 0:\nLabelCategories.Category(name='sky', parent='', attributes=set())\nLabelCategories.Category(name='car', parent='', attributes=set())\n\n- [LabelCategories.Category(name='sky', parent='', attributes=set()),\n?                                 ^^^\n\n+ [LabelCategories.Category(name='car', parent='', attributes=set()),\n?                                 ^^^\n\n   LabelCategories.Category(name='person', parent='', attributes=set()),\n-  LabelCategories.Category(name='license plate', parent='', attributes=set()),\n?                                 ^^^^^^^^^^^^^\n\n+  LabelCategories.Category(name='door', parent='', attributes=set()),\n?                                 ^^^^\n\n   LabelCategories.Category(name='rim', parent='', attributes=set())]"
    }
  ]
}

TableComparator

table_compare.txt

High-level Comparison:
+--------------------------+---------------------------------+------------------------+
| Field                    | First                           | Second                 |
+==========================+=================================+========================+
| Format                   | ade20k2017                      | ade20k2020             |
+--------------------------+---------------------------------+------------------------+
| Number of classes        | 4                               | 4                      |
+--------------------------+---------------------------------+------------------------+
| Common classes           | person, rim                     | person, rim            |
+--------------------------+---------------------------------+------------------------+
| Classes                  | license plate, person, rim, sky | car, door, person, rim |
+--------------------------+---------------------------------+------------------------+
| Images count             | 3                               | 3                      |
+--------------------------+---------------------------------+------------------------+
| Unique images count      | 1                               | 1                      |
+--------------------------+---------------------------------+------------------------+
| Repeated images count    | 1                               | 1                      |
+--------------------------+---------------------------------+------------------------+
| Annotations count        | 10                              | 22                     |
+--------------------------+---------------------------------+------------------------+
| Unannotated images count | 0                               | 0                      |
+--------------------------+---------------------------------+------------------------+

Mid-level Comparison:
+-------------------------------------+--------------------------+--------------------------+
| Field                               | First                    | Second                   |
+=====================================+==========================+==========================+
| dataset - Image Mean                | 1.00,   1.00,   1.00     | 1.00,   1.00,   1.00     |
+-------------------------------------+--------------------------+--------------------------+
| dataset - Image Std                 | 0.00,   0.00,   0.00     | 0.00,   0.00,   0.00     |
+-------------------------------------+--------------------------+--------------------------+
| dataset_with_meta_file - Image Mean | 1.00,   1.00,   1.00     | 1.00,   1.00,   1.00     |
+-------------------------------------+--------------------------+--------------------------+
| dataset_with_meta_file - Image Std  | 0.00,   0.00,   0.00     | 0.00,   0.00,   0.00     |
+-------------------------------------+--------------------------+--------------------------+
| Label - license plate               | imgs: 3, percent: 0.3000 |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - person                      | imgs: 3, percent: 0.3000 | imgs: 8, percent: 0.3636 |
+-------------------------------------+--------------------------+--------------------------+
| Label - rim                         | imgs: 1, percent: 0.1000 | imgs: 2, percent: 0.0909 |
+-------------------------------------+--------------------------+--------------------------+
| Label - sky                         | imgs: 3, percent: 0.3000 |                          |
+-------------------------------------+--------------------------+--------------------------+
| Label - car                         |                          | imgs: 6, percent: 0.2727 |
+-------------------------------------+--------------------------+--------------------------+
| Label - door                        |                          | imgs: 6, percent: 0.2727 |
+-------------------------------------+--------------------------+--------------------------+

Low-level Comparison:
+-----------------+---------+
| Field           |   Value |
+=================+=========+
| Covariate shift |    0    |
+-----------------+---------+
| Label shift     |    0.75 |
+-----------------+---------+

wonjuleee
wonjuleee previously approved these changes Jun 5, 2023
sooahleex
sooahleex previously approved these changes Jun 5, 2023
@JihwanEom JihwanEom dismissed stale reviews from sooahleex and wonjuleee via d9f71a5 June 5, 2023 06:49
@vinnamkim vinnamkim added this to the 1.4.0 milestone Jun 5, 2023
@vinnamkim vinnamkim added FEATURE New feature & functionality refactoring labels Jun 5, 2023
@JihwanEom JihwanEom marked this pull request as draft June 7, 2023 01:52
@JihwanEom JihwanEom marked this pull request as ready for review June 7, 2023 02:10
Copy link
Contributor

@vinnamkim vinnamkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@JihwanEom JihwanEom merged commit a702d61 into openvinotoolkit:develop Jun 7, 2023
vinnamkim added a commit that referenced this pull request Jun 30, 2023
- Resolve #1065 
- `tabulate` was introduced in
#1012, but not added to
requirements-core.txt
- The reason why we haven't found it is because `tabulate` is dependency
of `dvc`, so that there has been no problem if we install
`datumaro[default]`.

Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FEATURE New feature & functionality refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants