Robust Inverse Graphics via Probabilistic Inference

Authors: Tuan Anh Le, Pavel Sountsov, Matthew Douglas Hoffman, Ben Lee, Brian Patton, Rif A. Saurous

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the RIG approach on 3D datasets with a number of prior and Ne RF representations, across a number of possible corruptions. We empirically show that full probabilistic inference produces better results than point estimates on the monocular depth
Researcher Affiliation Industry 1Google. Correspondence to: Tuan Anh Le <tuananhl@google.com>, Pavel Sountsov <siege@google.com>.
Pseudocode Yes Algorithm 1 Reconstruction-guidance diffusion conditioning with auxiliary latents (Re GAL)
Open Source Code Yes The source code for many of these experiments is available at https://github. com/tensorflow/probability/tree/main/ discussion/robust_inverse_graphics.
Open Datasets Yes Datasets We evaluate our method on two datasets. For the first dataset, we use the cars category from Shape Net (Chang et al., 2015). For the second dataset, we use the Multi Shape Net (MSN) dataset (Sajjadi et al., 2022).
Dataset Splits Yes The dataset consists of 3486 cars, where 3137 are used for training and the remaining 349 for evaluation.
Hardware Specification Yes On a single A100 GPU, for each image it takes 9.5 minutes for Multi Shape Net and 7.5 minutes for Shape Net to run Re GAL for 2000 steps to generate 8 particles.
Software Dependencies No The paper mentions using Adam optimizer and refers to Real NVP, but does not provide specific version numbers for any software dependencies or libraries like TensorFlow, PyTorch, or Python versions.
Experiment Setup Yes Prob Ne RF We train for 2 * 10^6 steps. We use the Adam (Kingma & Ba, 2017) optimizer with a learning rate schedule where we warm up the learning rate from 0 to 10^-4 over 50 steps, and then step-wise halve it every 50000 steps afterward. We used a minibatch of 8 scenes. For the guide, we use 10 random views per scene. SSDNe RF We train for 5 * 10^5 steps. For Shape Net, we use the Adam optimizer with a learning rate schedule where we warm up the learning rate from 0 to 10^-3 and then step-wise halve it every 125000 steps afterward.