reproducibilityindex.ai

Unsupervised Learning of Shape and Pose with Differentiable Point Clouds

Authors: Eldar Insafutdinov, Alexey Dosovitskiy

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed approach on the task of estimating the shape and the camera pose from a single image of an object 2. The method successfully learns to predict both the shape and the pose, with only a minor performance drop relative to a model trained with ground truth camera poses. The point-cloud-based formulation allows for effective learning of high-ﬁdelity shape models when provided with images of sufﬁciently high resolution as supervision. We demonstrate learning point clouds from silhouettes and augmenting those with color if color images are available during training. Finally, we show how the point cloud representation allows to automatically discover semantic correspondences between objects.
Researcher Affiliation	Collaboration	Eldar Insafutdinov Max Planck Institute for Informatics eldar@mpi-inf.mpg.de Alexey Dosovitskiy Intel Labs adosovitskiy@gmail.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The project website with code can be found at https://eldar.github.io/Point Clouds/.
Open Datasets	Yes	Datasets. We conduct the experiments on 3D models from the Shape Net [3] dataset. We focus on 3 categories typically used in related work: chairs, cars, and airplanes. We follow the train/test protocol and the data generation procedure of Tulsiani et al. [20]: split the models into training, validation and test sets and render 5 random views of each model with random light source positions and random camera azimuth and elevation, sampled uniformly from [0 , 360 ) and [ 20 , 40 ] respectively.
Dataset Splits	Yes	Datasets. We conduct the experiments on 3D models from the Shape Net [3] dataset. We focus on 3 categories typically used in related work: chairs, cars, and airplanes. We follow the train/test protocol and the data generation procedure of Tulsiani et al. [20]: split the models into training, validation and test sets and render 5 random views of each model with random light source positions and random camera azimuth and elevation, sampled uniformly from [0 , 360 ) and [ 20 , 40 ] respectively.
Hardware Specification	No	The paper mentions GPU memory (12Gb) but does not provide specific details on the GPU model, CPU, or other hardware used for the experiments. It only states 'does not ﬁt into 12Gb of GPU memory with our batch size'.
Software Dependencies	No	The paper mentions using TensorFlow [1] and Adam optimizer [9] but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Training details. We trained the networks using the Adam optimizer [9], for 600,000 mini-batch iterations. We used mini-batches of 16 samples (4 views of 4 objects). We used a ﬁxed learning rate of 0.0001 and the standard momentum parameters. We used the fast projection in most experiments, unless mentioned otherwise. We varied both the number of points in the point cloud and the resolution of the volume used in the projection operation depending on the resolution of the ground truth projections used for supervision. We used the volume with the same side as the training samples (e.g., 643 volume for 642 projections), and we used 2000 points for 322 projections, 8000 points for 642 projections, and 16,000 points for 1282 projections.