Deep Single-View 3D Object Reconstruction with Visual Hull Embedding

Authors: Hanqing Wang, Jiaolong Yang, Wei Liang, Xin Tong8941-8948

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment on both synthetic data and real images show that embedding a single-view visual hull for shape refinement can significantly improve the reconstruction quality by recovering more shapes details and improving shape consistency with the input image.
Researcher Affiliation Collaboration 1Beijing Institute of Technology, 2Microsoft Research Asia
Pseudocode No The paper does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes We use the Shape Net object images rendered by (Choy et al. 2016) to train and test our method. We then use the PASCAL 3D+ dataset (Xiang, Mottaghi, and Savarese 2014) to evaluate our method on real images with pseudo ground truth shapes.
Dataset Splits Yes Following (Choy et al. 2016), we use 80% of the 3D models for training and the rest 20% for testing.
Hardware Specification Yes For a batch of 24 input images, the forward pass of our whole network takes 0.44 seconds on an NVIDIA Tesla M40 GPU, i.e., our network processes one image with 18 milliseconds on average.
Software Dependencies No The paper mentions 'Our network is implemented in Tensor Flow' but does not specify a version number for TensorFlow or any other software dependencies with version information.
Experiment Setup Yes The input image size is 128 128 and the output voxel grid size is 32 32 32. Batch size of 24 and the ADAM solver are used throughout the training. We use a learning rate of 1e 4 for S-Net, V-Net and R-Net and divide it by 10 at the 20K-th and 60K-th iterations. The learning rate for P-Net is 1e 5 and is dropped by 10 at the 60K-th iteration. When finetuning all the subnets together the learning rate is 1e 5 and dropped by 10 at the 20K-th iteration.