Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kate/splitter #68

Merged
merged 8 commits into from
Dec 30, 2020
Merged

Kate/splitter #68

merged 8 commits into from
Dec 30, 2020

Conversation

jihyeonyi
Copy link
Contributor

@jihyeonyi jihyeonyi commented Dec 23, 2020

Summary

Added task-specific annotation splitters.

  • SplitforClassification
  • SplitforMatchingReID
  • SplitforDetection

How to test

python -m unittest -v tests/test_splitter.py

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT

@jihyeonyi jihyeonyi force-pushed the kate/splitter branch 2 times, most recently from 6d08cd7 to e9290f5 Compare December 23, 2020 08:41
Copy link
Contributor

@zhiltsov-max zhiltsov-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for a great PR!

I'd suggest fixing code formatting (spaces in arguments) and string formatting to keep the project code in a single style.

datumaro/plugins/splitter.py Outdated Show resolved Hide resolved
datumaro/plugins/splitter.py Outdated Show resolved Hide resolved
def __init__(self, dataset, seed):
super().__init__(dataset)

np.random.seed(seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this should be done closer to the actual use, because a lot can happen after constructor is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random shuffling occurs several times.
Do you think I really need to set seed whenever I use the shuffling function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make sure the results are reproducible. I suggest moving seed setting as much close to the point of use, as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, I'll add test codes to check the results are reproducible.

datumaro/plugins/splitter.py Show resolved Hide resolved
datumaro/plugins/splitter.py Outdated Show resolved Hide resolved
tests/test_splitter.py Outdated Show resolved Hide resolved
tests/test_splitter.py Outdated Show resolved Hide resolved
tests/test_splitter.py Outdated Show resolved Hide resolved
tests/test_splitter.py Outdated Show resolved Hide resolved
tests/test_splitter.py Outdated Show resolved Hide resolved
@jihyeonyi
Copy link
Contributor Author

I need help regarding complexity.
I refactor the code and get complexities like below.

$ python -m mccabe datumaro/plugins/splitter.py --min 10
188:4: 'SplitforMatchingReID.__init__' 16
307:4: 'SplitforMatchingReID._rebalancing' 12
370:4: 'SplitforDetection.__init__' 19

But Codacy gives 35 for SplitforMatchingReID.init and 20 for SplitforDetection.init
Why they're different? And how can resolve this issue?

@zhiltsov-max
Copy link
Contributor

Complexity from these tools is not something we look at. Don't bother with this.

Comment on lines +282 to +336
for idx, item in enumerate(self._extractor):
for subset, split in by_groups.items():
if idx in split:
group_id = self._group_map[subset]
item.annotations[0].group = group_id
break
Copy link
Contributor Author

@jihyeonyi jihyeonyi Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I use 'group' property to split 'test' subset into 'query' and 'gallery' set.
It seems that the default subset doesn't support a hierarchy.
In the original implementation, we add "Auxiliary" attribute to the annotation.
Do you think I have to change the current implementation w.r.t the original one?
Or do I have to implement the subset hierarchy?
Please share your opinion.

Copy link
Contributor

@zhiltsov-max zhiltsov-max Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there in no exact support for split hierarchy yet. To do what you want (as a workaround?), I'd create 2 extra subsets - test_gallery and test_query, this would look quite natural (at least, in Datumaro). Technically, subsets are extractors themselves, so it is possible to extend the implementation with some nesting. But it would require lots of changes in the importing/exporting code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the gallery and query sets are not subsets. They're a kind of tag.
I need more time to resolve this, so let's leave this as it is at the moment.
I'll resolve this issue while applying the command line feature.

@jihyeonyi jihyeonyi force-pushed the kate/splitter branch 2 times, most recently from 489e5eb to b712db3 Compare December 24, 2020 11:00
@jihyeonyi
Copy link
Contributor Author

I've rebased and squashed some commits.
I'd like to resolve the gallery/query tagging issue and support command line execution in the future pull request.

zhiltsov-max
zhiltsov-max previously approved these changes Dec 30, 2020
@zhiltsov-max zhiltsov-max merged commit 6f1f494 into develop Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants