-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-duplicate goals for objectnav #309
Conversation
Codecov Report
@@ Coverage Diff @@
## master #309 +/- ##
==========================================
- Coverage 77.34% 77.09% -0.26%
==========================================
Files 108 108
Lines 7090 7124 +34
==========================================
+ Hits 5484 5492 +8
- Misses 1606 1632 +26 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, we can make this changes work on previous structure of Object Nav. It will check goals_by_category
, but won't fail if it's not optimized.
else: | ||
raise RuntimeError("Episodes have no goals") | ||
|
||
goals_by_category = deserialized["goals_by_category"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
goals_by_category = deserialized["goals_by_category"] | |
self.goals_by_category = deserialized["goals_by_category"] |
@@ -30,7 +30,23 @@ class ObjectNavDatasetV1(PointNavDatasetV1): | |||
content_scenes_path: str = "{data_path}/content/{scene}.json.gz" | |||
|
|||
def to_json(self) -> str: | |||
self.goals_per_category = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we will have this code in from_json
, then self.goals_per_category
will be save here automatically.
if goals_id not in self.goals_per_category: | ||
self.goals_per_category[goals_id] = ep.goals | ||
|
||
self.episodes[i].goals = goals_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key can be recreated from episode data:
self.episodes[i].goals = goals_id | |
self.episodes[i].goals = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is close to merging. Does test that reads and writes and then reads the dataset works fine with this code?
goals_by_category: Dict[str, List[ObjectGoal]] | ||
|
||
@staticmethod | ||
def dedup_goals(dset: Dict[str, Any]) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're doing change in place on dset
. Maybe, cleaner will be use this function as:
deserialized["goals_by_category"] = self.dedup_goals(deserialized)
and move episode changing part to closer to episode.goals = self.goals_by_category[episode.goals_key]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in-place is done on purpose. It makes a considerable difference on the time it takes to construct the list of episodes if you do this change in-place first.
habitat/tasks/nav/object_nav_task.py
Outdated
|
||
:param goals_key: Key to retrieve goals with | ||
""" | ||
goals_key: str = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this key only once. I would use function/labda instead and avoid creating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need it all the time. We need to save out the goals_key
to disk. The de-duplicated dataset should be saved to disk for the best performance benefits (for reference, running de-dup takes 30 minutes, so it really should be done just once).
@@ -63,8 +114,18 @@ def from_json( | |||
self.category_to_scene_annotation_category_id.keys() | |||
), "category_to_task and category_to_mp3d must have the same keys" | |||
|
|||
for episode in deserialized["episodes"]: | |||
episode = NavigationEpisode(**episode) | |||
if len(deserialized["episodes"]) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to drop stop it here? I can imagine that some data that could left after deleting episodes won't be read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there aren't any episodes, there aren't any goals, so stopping here makes sense.
Looks good to go. One last piece worth to check how it will work with |
* added MeshTransformNode to store mesh hierarchies and relative transforms for loaded assets in MeshMetaData * refactor addComponent to utilize stored transforms instead of re-importing from file
* De-duplicate goals for objectnav
Motivation and Context
The object nav datasets are huge, both on disk and in memory. This is largely due to the goals being duplicated across different episodes to the same object category. This PR resolves this by de-duplicating the goals.
Script to dedup current episodes: https://gist.github.com/erikwijmans/49925738798c5ced3d50e12e3d13b491
How Has This Been Tested
It loads the episodes!
Types of changes