RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces
Authors: Sebastien Ehrhardt, Oliver Groth, Aron Monszpart, Martin Engelcke, Ingmar Posner, Niloy Mitra, Andrea Vedaldi
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects. Similar to other generative approaches, RELATE is trained end-to-end on raw, unlabeled data. ... We find that RELATE is also amenable to physically realistic scene editing and that it significantly outperforms prior art in object-centric scene generation in both synthetic (CLEVR, Shape Stacks) and real-world data (cars). ... We demonstrate the efficacy of RELATE in several scenarios, including balls rolling in bowls of variable shape [6], cluttered tabletops (CLEVR [16]), block stacking (Shape Stacks [12]), and videos of traffic at busy intersection. By ablating the interaction module, we show that modeling the spatial correlation between the objects is key. Furthermore, we compare RELATE to several recent GAN- and VAE-based baselines, including Block GAN [29], GENESIS [7] and OCF [1], in terms of Fréchet Inception Distance (FID) [13], and outperform even the best state-of-the-art model by up to 29 points. |
| Researcher Affiliation | Collaboration | Sébastien Ehrhardt 1 Oliver Groth1 Áron Monszpart2,3 Martin Engelcke1 Ingmar Posner1 Niloy J. Mitra2,4 Andrea Vedaldi1 1Department of Engineering Science, University of Oxford 2Department of Computer Science, University College London 3Niantic, 4 Adobe Research {hyenal,ogroth}@robots.ox.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | Yes | Source code, datasets and more results are available at http://geometry.cs.ucl.ac.uk/projects/2020/relate/. |
| Open Datasets | Yes | Source code, datasets and more results are available at http://geometry.cs.ucl.ac.uk/projects/2020/relate/. We conduct experiments on four different datasets. First, we consider a relatively simple dataset, BALLSINBOWL from [6]... To this, we add two popular synthetic datasets CLEVR [16] (cluttered tabletops) and Shape Stacks [12] (block stacking). Finally, we collected a new dataset REALTRAFFIC containing five hours of footage of a busy street intersection, divided into fragments containing from one to six cars. |
| Dataset Splits | No | The paper mentions training on datasets and evaluating on test sets, but it does not provide specific details on training/validation/test dataset splits (exact percentages, sample counts, or explicit splitting methodology) in the main text. |
| Hardware Specification | No | The paper mentions using "Hartree Centre resources" and "University of Oxford Advanced Research Computing (ARC) facility" but does not provide specific details such as GPU or CPU models, processor types, or memory amounts used for experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and Ada IN architecture, but it does not provide specific version numbers for any software libraries or dependencies, such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We learn mappings Ψb and Ψf using the same Adaptive Instance Normalization (Ada IN) [14] architecture. The spatial size of their output tensors is set to H = 16 and the final output image to 128 128 (which is reduced when needed for fair comparison to other methods). We used the Adam [18] optimizer for learning and train for a fixed number of epochs and always select the last model snapshot. |