Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for importing semantics from https://3dscenegraph.stanford.edu for use with gibson dataset #374

Open
msbaines opened this issue Dec 10, 2019 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@msbaines
Copy link
Contributor

🚀 Feature

We want to be able to use the semantic dataset from https://3dscenegraph.stanford.edu/ with the scene dataset from http://gibsonenv.stanford.edu/

@msbaines
Copy link
Contributor Author

The 3dscenegraph semantic dataset is currently limited to gibson_tiny. However, semantics for gibson_medium are expected to be release soon.

@msbaines
Copy link
Contributor Author

msbaines commented Dec 10, 2019

The mesh used for semantics is different from the mesh used for Habitat. The coordinate system is also different but both meshes. The Y and Z axis are switched but the origin is the same. We should be able to generate a semantic ply mesh from the origin .obj mesh by transforming vertex coordinates.

@msbaines
Copy link
Contributor Author

The semantic data from 3DSceneGraph is available via an npz file.

To access the data:

data = np.load(npz_path, allow_pickle=True)['output'].item()

This will return the following dictionary:

dict_keys(['building', 'room', 'object', 'camera', 'panorama'])

building

>>> pprint.pprint(data['building'])
{'floor_area': 35.04333970662052,
 'function': 'residential',
 'gibson_split': 'tiny',
 'id': 2,
 'name': 'Allensville',
 'num_cameras': 26,
 'num_floors': 1,
 'num_objects': 33,
 'num_rooms': 11,
 'object_inst_segmentation': array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]]),
 'object_voxel_occupancy': array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]]),
 'reference_point': array([0., 0., 0.]),
 'room_inst_segmentation': array([[10.],
       [ 8.],
       [ 8.],
       ...,
       [11.],
       [11.],
       [11.]]),
 'room_voxel_occupancy': array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]]),
 'size': array([9.76260114, 8.85856447, 2.50682107]),
 'volume': 201.63530725199752,
 'voxel_centers': array([[-0.98990329, -0.97658977, -0.02014603],
       [-0.98990329, -0.97658977,  0.07985397],
       [-0.98990329, -0.97658977,  0.17985397],
       ...,
       [ 8.71009671,  7.82341023,  2.27985397],
       [ 8.71009671,  7.82341023,  2.37985397],
       [ 8.71009671,  7.82341023,  2.47985397]]),
 'voxel_resolution': array([98, 89, 26]),
 'voxel_size': 0.1}

room

>>> pprint.pprint(data['room'])
{1: {'floor_area': 8.73437848991798,
     'floor_number': 'A',
     'id': 1,
     'location': array([3.53988  , 0.2945975, 1.116783 ]),
     'parent_building': 2,
     'scene_category': 'bathroom',
     'size': array([2.5752  , 2.370445, 2.254074]),
     'volume': 10.48092837735436},
 2: {'floor_area': 9.826049281845457,
     'floor_number': 'A',
     'id': 2,
     'location': array([0.42217  , 2.404948 , 1.1276055]),
     'parent_building': 2,
     'scene_category': 'bathroom',
     'size': array([2.88698 , 2.927084, 2.259769]),
     'volume': 13.84569568093703},
 3: {'floor_area': 11.789331640706246,
     'floor_number': 'A',
     'id': 3,
     'location': array([6.99981 , 0.605225, 1.24227 ]),
     'parent_building': 2,
     'scene_category': 'bedroom',
     'size': array([3.39914, 3.06071, 2.48226]),
     'volume': 23.095719066402776},

 ...

 11: {'floor_area': 7.659161263755288,
      'floor_number': 'A',
      'id': 11,
      'location': array([ 0.2298935, -0.0203395,  1.2285965]),
      'parent_building': 2,
      'scene_category': 'lobby',
      'size': array([2.361493, 1.971701, 2.453287]),
      'volume': 9.14560537370886}}

object

>>> pprint.pprint(data['object'])
{1: {'action_affordance': ['open', 'close', 'cook', 'heat', 'defrost', 'clean'],
     'class_': 'microwave',
     'floor_area': 2.826599475275465,
     'id': 1,
     'location': array([2.83998585, 4.76085063, 1.49223023]),
     'material': ['glass', 'metal'],
     'parent_room': 9,
     'size': array([0.40677453, 1.2802279 , 0.45474387]),
     'surface_coverage': 0.6978848300032634,
     'tactile_texture': None,
     'visual_texture': None,
     'volume': 0.08689193617144757},
 2: {'action_affordance': ['open',
                           'close',
                           'heat',
                           'turn on',
                           'turn off',
                           'clean'],
     'class_': 'oven',
     'floor_area': 3.1440354889034574,
     'id': 2,
     'location': array([2.98861606, 4.78304369, 0.46367262]),
     'material': ['metal', 'glass'],
     'parent_room': 9,
     'size': array([0.7124521 , 1.00192841, 0.94029514]),
     'surface_coverage': 1.3838881855100549,
     'tactile_texture': None,
     'visual_texture': None,
     'volume': 0.32710032579657},
 3: {'action_affordance': ['wash', 'clean'],
     'class_': 'sink',
     'floor_area': 1.7597120848145011,
     'id': 3,
     'location': array([ 4.23522156, -0.57456161,  0.91402512]),
     'material': ['ceramic', None],
     'parent_room': 1,
     'size': array([0.57416017, 0.54074392, 0.17042408]),
     'surface_coverage': 0.2042751198409106,
     'tactile_texture': None,
     'visual_texture': None,
     'volume': 0.014058957460380607},

 ...

 33: {'action_affordance': ['sit at',
                            'lay on',
                            'pick up',
                            'move',
                            'clean',
                            'set',
                            'decorate'],
      'class_': 'dining table',
      'floor_area': 3.473003668596787,
      'id': 33,
      'location': array([4.48357247, 6.70686119, 0.5614044 ]),
      'material': ['wood', None],
      'parent_room': 8,
      'size': array([1.14685995, 0.68447991, 0.6124021 ]),
      'surface_coverage': 1.3836484369355602,
      'tactile_texture': None,
      'visual_texture': None,
      'volume': 0.27629888509653355}}

camera

>>> pprint.pprint(data['camera'])
 1: {'FOV': 1.0489180166567196,
        'id': 1,
        'location': array([6.19820356, 4.94441748, 1.27608538]),
        'modality': 'RGB',
        'name': 'point_0_view_0',
        'parent_room': 11,
        'resolution': array([1024, 1024]),
        'rotation': [1.616633415222168, -0.01483128871768713, 1.8443574905395508]},

 ...

 2863: {'FOV': 1.0376312872786024,
        'id': 2863,
        'location': array([3.41174531, 6.73860884, 1.23510659]),
        'modality': 'RGB',
        'name': 'point_9_view_4',
        'parent_room': 11,
        'resolution': array([1024, 1024]),
        'rotation': [1.9443336725234985,
                     0.0011426351265981793,
                     -1.8368979692459106]}}

panorama

>>> pprint.pprint(data['panorama'])
{'p000001': {'object_class': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16),
             'object_instance': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16)},
 'p000002': {'object_class': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16),
             'object_instance': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16)},

 ...

 'p000026': {'object_class': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16),
             'object_instance': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16)}}

@msbaines
Copy link
Contributor Author

Was hoping to load the .npz file in C++ using cnpy, but the .npz contains pickled data which cnpy can't handle. I could potentially handle the pickled data using http://www.picklingtools.com/.

Alternatively, I could do all the processing in python but the python tools for writing a mesh are cumbersome and likely slow. So for now, I think I will write a python script to convert the data I need into a format that can be easily loaded in C++ and then do the processing in C++.

@msbaines
Copy link
Contributor Author

The semantic mask information is located in:

data['building']['object_inst_segmentation']

which an array that is the same size as the number of faces. Each element in the array contains the semantic object_id for that face. If we don't have semantic information for that face, the id is 0.

The format of the array:

>>> type(data['building']['object_inst_segmentation'])
<class 'numpy.ndarray'>
>>> type(data['building']['object_inst_segmentation'][0])
<class 'numpy.ndarray'>
>>> data['building']['object_inst_segmentation'][0]
array([0.])
>>> data['building']['object_inst_segmentation'][0][0]
0.0
>>> type(data['building']['object_inst_segmentation'][0][0])
<class 'numpy.float64'>

@msbaines
Copy link
Contributor Author

We should be able to write out the object ids using the following code:

f = open("out.bin", "wb")
object_ids = data['building']['object_inst_segmentation']
f.write(object_ids.astype(np.int16).tobytes())
f.close()

@msbaines
Copy link
Contributor Author

Bounding box information:

Looking at the object schema:

 33: {'action_affordance': ['sit at',
                            'lay on',
                            'pick up',
                            'move',
                            'clean',
                            'set',
                            'decorate'],
      'class_': 'dining table',
      'floor_area': 3.473003668596787,
      'id': 33,
      'location': array([4.48357247, 6.70686119, 0.5614044 ]),
      'material': ['wood', None],
      'parent_room': 8,
      'size': array([1.14685995, 0.68447991, 0.6124021 ]),
      'surface_coverage': 1.3836484369355602,
      'tactile_texture': None,
      'visual_texture': None,
      'volume': 0.27629888509653355}}

location and size may be the axis-aligned bounding box. This will have to be verified.

@msbaines
Copy link
Contributor Author

Semantics seem to be working with the following transformation:

x1 = x0
y1 = -z0
z1 = y1

msbaines pushed a commit that referenced this issue Dec 17, 2019
It is a two-step process to create a Gibson semantic mesh. First
you need to extract the object_id table from the .npz file. Then
you can create the semantic_mesh from the extracted ids file
and the .obj file the .npz is based on.

Addresses: Issue #374
mathfac added a commit that referenced this issue Jan 2, 2020
…is missing (#406)

With current code state 3dscenegraph semantic annotation files (*.scn) won't load, as our semantic loading pipeline triggers only on *.house files. To enable functionality implemented in #393 and #374 added loading of Gibson Semantics scene if MP3D semantic is missing. To test semantic loading e2e added integration test that will run only when *.scn test data is available.
mathfac added a commit that referenced this issue Jan 18, 2020
…ibson semantic scenes (#407)

To leverage 3dscenegraph semantic annotation spatial information added support of object's bounding boxes to Gibson semantic scenes. Related to issue #374 and depends on PR #406.
@ybgdgh
Copy link

ybgdgh commented Sep 15, 2020

Hello, I want to know can we get the room's centers and bounding boxes from habitat in Gibson dataset? I used the 3Dscenegraph as gibson semantics but only get the SemanticObject class. Thanks!

Ram81 pushed a commit to Ram81/habitat-web-sim that referenced this issue Dec 10, 2020
…is missing (facebookresearch#406)

With current code state 3dscenegraph semantic annotation files (*.scn) won't load, as our semantic loading pipeline triggers only on *.house files. To enable functionality implemented in facebookresearch#393 and facebookresearch#374 added loading of Gibson Semantics scene if MP3D semantic is missing. To test semantic loading e2e added integration test that will run only when *.scn test data is available.
Ram81 pushed a commit to Ram81/habitat-web-sim that referenced this issue Dec 10, 2020
…ibson semantic scenes (facebookresearch#407)

To leverage 3dscenegraph semantic annotation spatial information added support of object's bounding boxes to Gibson semantic scenes. Related to issue facebookresearch#374 and depends on PR facebookresearch#406.
luoj1 pushed a commit to luoj1/habitat-sim that referenced this issue Aug 16, 2022
…#430)

This is to implement the Encoder-Decoder CNN feature extractor to be used in the EQA baseline implementation, as in facebookresearch#374. Feature extraction from scene images is the first part of each of the subsequent trainers (VQA, PACMAN) in the EQA implementation.

Implementation based on EmbodiedQA, Das et al, CVPR 2018 (paper, code)
@aclegg3 aclegg3 added the enhancement New feature or request label Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants