This is my Bachelor's Thesis at UPT, called YOVO: You Only Voxelize Once - a State-of-the-Art* 3D Object Reconstruction from a single 2D image. My thesis supervisor was Dr. Eng. Conf. Calin-Adrian Popa.
The nomenclature YOVO: You Only Voxelize Once derives from the fact that it only uses one input image in order to reconstruct a 3D voxelized representation of the presented object. It is inspired by Pix2Vox and reinvents the Autoencoder module, introducing multi-level feature extraction, leading to multiple volume reconstructions at differente levels of abstractions. Moreover, using a MobileNetV2 backbone, Mish activations, Dropblock regularization, Ranger optimizer, YOVO overcomes the SotA on the ShapeNet subset Data3D-R2N2, currently* held by Pix2Vox-A.
(* as of March 2020)
The YOVO architecture comes in 3 variants:
- YOVO : the classic version that introduces mult-level feature processing and a bunch of other techniques
- YOVO-s : simplified version that eliminates the Refiner and extends the Decoder
- YOVO-e : extended version that extends both the Refiner and the Decoder
In-depth details are presented after the How to Run section.
All three variants of YOVO manage to overcome the SotA results of Pix2Vox-A.
Here are some experimantal results:
The ShapeNet dataset is used to sample the Data 3D-R2N2 subset, which is used in the experiments. The download links are avaliable below:
- ShapeNet rendering images: Renderings
- ShapeNet voxelized models: 3D voxelized models
YOVO original model can be found at: YOVO. YOVO-s and YOVO-e model can be found at: YOVO-s / YOVO-e.
git clone https://github.com/caiusdebucean/YOVO.git
cd YOVO
pip install -r requirements.txt
The code is heavily inspired by Pix2Vox. Credits to the original creators.
Dataset location:
__C.DATASETS.SHAPENET.RENDERING_PATH = '/path/to/Datasets/ShapeNet/ShapeNetRendering/%s/%s/rendering/%02d.png'
__C.DATASETS.SHAPENET.VOXEL_PATH = '/path/to/Datasets/ShapeNet/ShapeNetVox32/%s/%s/model.binvox'
YOVO Architecture:
__C.NETWORK.YOVO_VERSION = 'classic' #[classic, simple, extended, custom]
Visualizing results:
__C.TEST.VIEW_KAOLIN = True # Rendering with kaolin. This should be done locally, not through ssh
__C.TEST.SAVE_RENDERED_IMAGE = True # Save the input preprocessed image containing the object
__C.TEST.NO_OF_RENDERS = 1 # How many examples/class to be saved for visualization during test/validation
__C.TEST.SAVE_GIF = True # Save a GIF of 360 rotating volume for the saved objects
__C.TEST.RENDER_THRESHOLD = 0.85 # How confident should the saved predictions be
__C.TEST.GENERATE_MULTILEVEL_VOLUMES = True # Generate the reconstructed volumes at the autoencoder level
To train YOVO, run following command in the root folder:
python3 runner.py --name custom_name
To test YOVO, run following command in the root folder:
python3 runner.py --name custom_name --test --weights=/path/model.pth
The full architecture can be seen here:
This project is open sourced under MIT license, and is for scientific purposes only, containing my trials of pushing the results to a new SotA and testing various methods.
More documentation is coming soon, as the project is still in development and alternative techniques/ablation studies remain to be explored. Stay tuned!
All illustrations are original content and should be credited accordingly.