Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubles running 'remap_labels' on ProjectDataset #402

Closed
kmuchmore opened this issue Aug 5, 2021 · 3 comments · Fixed by #238 or #407
Closed

Troubles running 'remap_labels' on ProjectDataset #402

kmuchmore opened this issue Aug 5, 2021 · 3 comments · Fixed by #238 or #407
Labels
BUG Something isn't working DOC Improvements or additions to documentation

Comments

@kmuchmore
Copy link

In this example, the dataset given from project.make_dataset() returns a ProjectDataset, I am not able to run remap_labels on this dataset as it just returns a TypeError: 'str' object is not callable. I looked through the tests and it looks like all the tests are running remap_labels on datumaro.components.dataset.Dataset objects and I don't see any tests/examples running on datumaro.components.project.ProjectDataset.

I'm not sure if I'm doing something wrong, but just running this example on the master and development branch seems to give me the same TypeError: 'str' object is not callable error.

If I use the dataset.Dataset class it works fine, but I'm not sure how to take a ProjectDataset and turn it into just a Dataset

from datumaro.components.project import Project

# load a Datumaro project
project = Project.load('directory')

# create a dataset
dataset = project.make_dataset()

# keep only annotated images
dataset.select(lambda item: len(item.annotations) != 0)

# change dataset labels
dataset.transform('remap_labels',
  {'cat': 'dog', # rename cat to dog
    'truck': 'car', # rename truck to car
    'person': '', # remove this label
  }, default='delete') # remove everything else
@kmuchmore
Copy link
Author

As a workaround I am running this instead

import datumaro.plugins.transforms as transforms

dataset = transforms.RemapLabels(dataset,
  mapping={'cat': 'dog', # rename cat to dog
    'truck': 'car', # rename truck to car
    'person': '', # remove this label
  }, default='delete')

This seems to get me past that step, but it changes the type of the dataset to a datumaro.plugins.transforms.RemapLabels

@kmuchmore
Copy link
Author

Alright, well that bombed when I tried to export the dataset. I'll keep looking for a solution. Any help on what I'm doing wrong or what I could correct would be great. Thanks!

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Aug 6, 2021

Hi, thanks for reporting the problem with the example. Yes, it looks like ProjectDataset was not updated after recent Dataset changes and it can accept only a transform class instead of allowing a string name in the transform method. The workaround you tried was correct, it just represents how it works under the hood. To make everything working do any of the following:

# Just pass the transform class in the example code instead of string
dataset.transform(project.env.transforms['remap_labels'],
# or
from datumaro.plugins.transforms import RemapLabels
dataset.transform(RemapLabels,
# but it will just do what you did in the workaround, i.e. it will not return a Dataset instance

# Continue the workaround you tried with
dataset = Dataset.from_extractors(dataset)
dataset.export('path/', 'format_name')

# or (answering your question about getting a Dataset from ProjectDataset)
dataset = Dataset.from_extractors(project.make_dataset())
... # the example code

All of this will produce the same results.

@zhiltsov-max zhiltsov-max added BUG Something isn't working DOC Improvements or additions to documentation labels Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG Something isn't working DOC Improvements or additions to documentation
Projects
None yet
2 participants