Learning 3D Particle-based Simulators from RGB-D Videos

Authors: William F Whitney, Tatiana Lopez-Guevara, Tobias Pfaff, Yulia Rubanova, Thomas Kipf, Kim Stachenfeld, Kelsey R Allen

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test VPD on three datasets which stress different simulator capabilities. The Mu Jo Co block dataset (Todorov et al., 2012) is visually simple but tests a model s ability to accurately represent crisp rigid contact (Allen et al., 2022). The Kubric datasets (Greff et al., 2022) encompass a range of visual complexities, from Platonic solids to densely-textured scans of real objects and backgrounds, and tests a model s ability to represent multi-object interactions in varied visual environments. The deformable dataset evaluates a model s ability to represent the dynamics of non-rigid objects with a large number of degrees of freedom. In all cases, the models are provided with RGB-D views from multiple cameras. For evaluation, 16 trajectories are chosen at random and held out from each dataset, and we report each model s PSNR (with SSIM in Appendix E) (Wang et al., 2004).
Researcher Affiliation Industry William F. Whitney , Tatiana Lopez-Guevara , Tobias Pfaff, Yulia Rubanova, Thomas Kipf, Kimberly Stachenfeld, Kelsey R. Allen Google Deep Mind
Pseudocode Yes Algorithm 1 in Appendix A details the entire message passing algorithm. ... Algorithm 1: Hierarchical message passing
Open Source Code No The paper does not provide an explicit statement of open-source code for the described methodology or a link to a code repository.
Open Datasets Yes The Mu Jo Co block dataset (Todorov et al., 2012), Kubric datasets (Greff et al., 2022), Deformables is a dataset of deformable objects, simulated and rendered using Blender (Blender, 2018) softbody physics.
Dataset Splits No The paper mentions 'training trajectories' and 'held-out trajectories' for evaluation but does not explicitly describe a separate 'validation' dataset split.
Hardware Specification No The paper mentions 'We use a batch size of 16 split across 16 GPUs' but does not specify the exact GPU models or other detailed hardware specifications.
Software Dependencies No The paper mentions software components like 'UNet architecture', 'Adam optimizer', and 'jaxnerf codebase' but does not provide specific version numbers for these dependencies.
Experiment Setup Yes During training, we roll out the model for T = 6 time steps. For each unrolled time step, we render 256 rays, and supervise on the corresponding ground truth pixel value as in Equation 3. We apply a small amount of Gaussian noise centered at 0 and with sigma given by location noise in Table 3 to the particle locations during training rollouts to improve robustness. We use a batch size of 16 split across 16 GPUs. Optimization uses the Adam optimizer (Kingma & Ba, 2014) with a learning rate that begins at 3e 4, then decays by a factor of 3 at 100K and 300K updates. Models are trained for 400K updates.