Learning 3D Particle-based Simulators from RGB-D Videos
Authors: William F Whitney, Tatiana Lopez-Guevara, Tobias Pfaff, Yulia Rubanova, Thomas Kipf, Kim Stachenfeld, Kelsey R Allen
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test VPD on three datasets which stress different simulator capabilities. The Mu Jo Co block dataset (Todorov et al., 2012) is visually simple but tests a model s ability to accurately represent crisp rigid contact (Allen et al., 2022). The Kubric datasets (Greff et al., 2022) encompass a range of visual complexities, from Platonic solids to densely-textured scans of real objects and backgrounds, and tests a model s ability to represent multi-object interactions in varied visual environments. The deformable dataset evaluates a model s ability to represent the dynamics of non-rigid objects with a large number of degrees of freedom. In all cases, the models are provided with RGB-D views from multiple cameras. For evaluation, 16 trajectories are chosen at random and held out from each dataset, and we report each model s PSNR (with SSIM in Appendix E) (Wang et al., 2004). |
| Researcher Affiliation | Industry | William F. Whitney , Tatiana Lopez-Guevara , Tobias Pfaff, Yulia Rubanova, Thomas Kipf, Kimberly Stachenfeld, Kelsey R. Allen Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 in Appendix A details the entire message passing algorithm. ... Algorithm 1: Hierarchical message passing |
| Open Source Code | No | The paper does not provide an explicit statement of open-source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | The Mu Jo Co block dataset (Todorov et al., 2012), Kubric datasets (Greff et al., 2022), Deformables is a dataset of deformable objects, simulated and rendered using Blender (Blender, 2018) softbody physics. |
| Dataset Splits | No | The paper mentions 'training trajectories' and 'held-out trajectories' for evaluation but does not explicitly describe a separate 'validation' dataset split. |
| Hardware Specification | No | The paper mentions 'We use a batch size of 16 split across 16 GPUs' but does not specify the exact GPU models or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions software components like 'UNet architecture', 'Adam optimizer', and 'jaxnerf codebase' but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | During training, we roll out the model for T = 6 time steps. For each unrolled time step, we render 256 rays, and supervise on the corresponding ground truth pixel value as in Equation 3. We apply a small amount of Gaussian noise centered at 0 and with sigma given by location noise in Table 3 to the particle locations during training rollouts to improve robustness. We use a batch size of 16 split across 16 GPUs. Optimization uses the Adam optimizer (Kingma & Ba, 2014) with a learning rate that begins at 3e 4, then decays by a factor of 3 at 100K and 300K updates. Models are trained for 400K updates. |