Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out items without annotations #1208

Closed
CourchesneA opened this issue Nov 27, 2023 · 2 comments
Closed

Filter out items without annotations #1208

CourchesneA opened this issue Nov 27, 2023 · 2 comments
Assignees

Comments

@CourchesneA
Copy link

CourchesneA commented Nov 27, 2023

I am trying to remove items for which I do not have annotations in a dataset. From the documentation, I have seen that the following could help me:

datum filter -m i+a ./my-dataset

it looks like it is required to have a filter, so I can add a generic filter that matches all:

datum filter -m i+a -e '/item/image[has_data=1]' -o ./my-output-dataset --dry-run ./my-dataset

However, it always results in an empty output (dry-run returns nothing). By using datum stats I can see that this test dataset had 13 images and 10 for which there are annotations.

By removing the --mode flag, I get my whole dataset printed as XML and I can see that most of the items have annotations:

<item>
  <id>my_folder/my_image001</id>
  <subset>Test</subset>
  <image>
    <has_data>1</has_data>
   ...
  </image>
  <annotation>
    <id>0</id>
    <type>bbox</type>
    <occluded>False</occluded>
    <rotation>0.0</rotation>
    <group>0</group>
    <label>mylabel</label>
    <label_id>0</label_id>
    <x>1010.77001953125</x>
    <y>359.7699890136719</y>
    <w>61.1500244140625</w>
    <h>69.20001220703125</h>
    <area>4231.582435913384</area>
  </annotation>
</item>

I would expect the -m i+a to be able to output a valid dataset of only the items for which annotations are available

@jihyeonyi
Copy link
Contributor

Hi @CourchesneA,
Thank you for the reporting the issue.

First of all, the has_data field does not indicate whether there is annotation or not. It indicates the presence of image data.
And since it belongs to the image node, using -m -i will help retrieve items where has_data=1.

To filter out data with annotations, you can use the following command:

datum filter -e '/item[annotation]'

The item[annotation] checks if there is a child named annotation within the item node.

I hope my response will be helpful.

@CourchesneA
Copy link
Author

That is what I was looking for. Thanks !!

vinnamkim pushed a commit to vinnamkim/datumaro that referenced this issue Nov 28, 2023
<!-- Contributing guide:
https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md
-->

### Summary
1. Correct documents to use correctly use 'datum project import'
command.
2. Add filtering example to filter out items containing annotations
only. (openvinotoolkit#1208)
<!--
Resolves openvinotoolkit#111 and openvinotoolkit#222.
Depends on openvinotoolkit#1000 (for series of dependent commits).

This PR introduces this capability to make the project better in this
and that.

- Added this feature
- Removed that feature
- Fixed the problem openvinotoolkit#1234
-->

### How to test
<!-- Describe the testing procedure for reviewers, if changes are
not fully covered by unit tests or manual testing can be complicated.
-->

### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [ ] I have added unit tests to cover my changes.​
- [ ] I have added integration tests to cover my changes.​
- [x] I have added the description of my changes into
[CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md).​
- [x] I have updated the
[documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs)
accordingly

### License

- [ ] I submit _my code changes_ under the same [MIT
License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE)
that covers the project.
  Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example
below).

```python
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants