Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer

Authors: Wenzheng Chen, Huan Ling, Jun Gao, Edward Smith, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We showcase our approach in two ML applications: single-image 3D object prediction, and 3D textured object generation, both trained using exclusively using 2D supervision. Our project website is: https://nv-tlabs.github.io/DIB-R/. We demonstrate the effectiveness of our framework through three challenging ML applications. across which we achieve both numerical and visual state-of-the art results.
Researcher Affiliation Collaboration NVIDIA1 University of Toronto2 Vector Institute3 Mc Gill University4 Aalto University5 {wenzchen, huling, jung, esmith, jlehtinen, sfidler}@nvidia.com, jacobson@cs.toronto.edu
Pseudocode No No explicit pseudocode or algorithm block was found. The method is described mathematically through equations (1, 2, 4, 5, 6) and in prose.
Open Source Code Yes Our project website is: https://nv-tlabs.github.io/DIB-R/
Open Datasets Yes Dataset: As in [14, 20, 33], our dataset comprises 13 object categories from the Shape Net dataset [3]. Following CMR [13], we adopt CUB bird dataset [35] and PASCAL3D+ car dataset [38].
Dataset Splits Yes We use the same split of objects into our training and test set as [33].
Hardware Specification No No specific hardware details (e.g., GPU model, CPU type, memory) are mentioned for the experimental setup. The paper does not specify the computing environment used for training or inference.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x) are explicitly listed in the paper.
Experiment Setup Yes In our experiments, we set λcol = 1, λsm = 0.001, and λlap = 0.01. The network is optimized using the Adam optimizer [15], with α = 0.0001, β1 = 0.9, and β2 = 0.999. The batch size is 64, and the dimension of input image is 64 64. We set λadv = 0.5, λgp = 0.5, and λper = 1. We fix the learning rate for the discriminator to 1e 5 and optimize using Adam [15], with α = 0.0001, β1 = 0.5, and β2 = 0.999.