Introduction

📖 VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud (CVPR 2023 Highlight)

🔥 If you found the training scheme in VL-SAT is useful, please help to ⭐ it or recommend it to your friends. Thanks🔥

Introduction

This is a release of the code of our paper VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud (CVPR 2023 Highlight).

Authors: Ziqin Wang, Bowen Cheng, Lichen Zhao, Dong Xu, Yang Tang, Lu Sheng* (*corresponding author)

[arxiv] [code] [checkpoint]

Dependencies

conda create -n vlsat python=3.8
conda activate vlsat
pip install -r requirement.txt
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.12.1+cu113.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.12.1+cu113.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.12.1+cu113.html
pip install torch-geometric
pip install git+https://github.com/openai/CLIP.git

Prepare the data

A. Download 3Rscan and 3DSSG-Sub Annotation, you can follow 3DSSG

B. Generate 2D Multi View Image

# you should motify the path in pointcloud2image.py into your own path
python data/pointcloud2image.py

C. You should arrange the file location like this

data
  3DSSG_subset
    relations.txt
    classes.txt
    
  3RScan
    0a4b8ef6-a83a-21f2-8672-dce34dd0d7ca
      multi_view
      labels.instances.align.annotated.v2.ply
    ...

D. Train your own clip adapter

python clip_adapter/main.py

or just use the checkpoint

clip_adapter/checkpoint/origin_mean.pth

Run Code

# Train
python -m main --mode train --config <config_path> --exp <exp_name>
# Eval
python -m main --mode eval --config <config_path> --exp <exp_name>

In this repo, we have provided a default config

Paper

If you find the code useful please consider citing our paper:

@article{wang2023vl,
  title={VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud},
  author={Wang, Ziqin and Cheng, Bowen and Zhao, Lichen and Xu, Dong and Tang, Yang and Sheng, Lu},
  journal={arXiv preprint arXiv:2303.14408},
  year={2023}
}

Acknowledgement

This repository is partly based on 3DSSG and CLIP repositories.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
clip_adapter		clip_adapter
config		config
data		data
data_processing		data_processing
src		src
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
demo.png		demo.png
main.py		main.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📖 VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud (CVPR 2023 Highlight)

Introduction

Dependencies

Prepare the data

Run Code

Paper

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

wz7in/CVPR2023-VLSAT

Folders and files

Latest commit

History

Repository files navigation

📖 VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud (CVPR 2023 Highlight)

Introduction

Dependencies

Prepare the data

Run Code

Paper

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages