Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision
Authors: Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the ability of the model in generating 3D volume from a single 2D image with three sets of experiments: (1) learning from single-class objects; (2) learning from multi-class objects and (3) testing on novel object classes. Results show superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved. We conduct experimental evaluations using a subset of 3D models from Shape Net Core [1]. Results from single-class and multi-class training demonstrate excellent performance of our network for volumetric 3D reconstruction. |
| Researcher Affiliation | Collaboration | 1University of Michigan, Ann Arbor 2Adobe Research 3Google Brain |
| Pseudocode | No | The paper describes procedures using mathematical equations but does not present a formal pseudocode or algorithm block. |
| Open Source Code | Yes | To download the code, please refer to the project webpage: http://goo.gl/YEJ2H6. |
| Open Datasets | Yes | Shape Net Core. This dataset contains about 51,300 unique 3D models from 55 common object categories [1]. [1] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al. Shapenet: An information-rich 3d model repository. ar Xiv preprint ar Xiv:1512.03012, 2015. |
| Dataset Splits | No | For multicategory experiment, the training set includes 13 major categories: airplane, bench, dresser, car, chair, display, lamp, loudspeaker, rifle, sofa, table, telephone and vessel. Basically, we preserved 20% of instances from each category as testing data. |
| Hardware Specification | No | We acknowledge NVIDIA for the donation of GPUs. |
| Software Dependencies | Yes | The models including the perspective transformer nets are implemented using Torch [3]. [3] R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In Big Learn, NIPS Workshop, number EPFL-CONF-192376, 2011. |
| Experiment Setup | Yes | Implementation Details. We used the ADAM [7] solver for stochastic optimization in all the experiments. During the pre-training stage (for encoder), we used mini-batch of size 32, 32, 8, 4, 3 and 2 for training the RNN-1, RNN-2, RNN-4, RNN-8, RNN-12 and RNN-16 as used in Yang et al. [23]. We used the learning rate 10^-4 for RNN-1, and 10^-5 for the rest of recurrent neural networks. During the fine-tuning stage (for volume decoder), we used mini-batch of size 6 and learning rate 10^-4. For each object in a mini-batch, we include projections from all 24 views as supervision. |