Visual Interaction Networks: Learning a Physics Simulator from Video

Authors: Nicholas Watters, Daniel Zoran, Theophane Weber, Peter Battaglia, Razvan Pascanu, Andrea Tacchetti

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compared the VIN to a suite of baseline and competitor models, including ablation experiments. For each system we generated a dataset with 3 objects and a dataset with 6 objects. Each dataset had a training set of 2.5 105 simulations and a test set of 2.5 104 simulations, with each simulation 64 frames long. Our results show that the VIN predicts dynamics accurately, outperforming baselines on all datasets (see Figures 3 and 4).
Researcher Affiliation Industry Nicholas Watters, Andrea Tacchetti, Théophane Weber Razvan Pascanu, Peter Battaglia, Daniel Zoran Deep Mind London, United Kingdom {nwatters, atacchet, theophane, razp, peterbattaglia, danielzoran}@google.com
Pseudocode No The paper describes the architecture of the Visual Interaction Network using text and diagrams, but does not include pseudocode or algorithm blocks.
Open Source Code No The paper encourages the reader to view the videos at (https://goo.gl/Rj E3ey), but does not provide a link to the source code for the methodology.
Open Datasets Yes We rendered the system state on top of a CIFAR-10 natural image background. We rendered natural image backgrounds online from separate training and testing CIFAR-10 sets.
Dataset Splits Yes For each system we generated a dataset with 3 objects and a dataset with 6 objects. Each dataset had a training set of 2.5 105 simulations and a test set of 2.5 104 simulations, with each simulation 64 frames long.
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The model was trained by backpropagation with an Adam optimizer [15]. However, no specific version numbers for Adam or any other software libraries or frameworks (e.g., Python, TensorFlow, PyTorch) are provided.
Experiment Setup Yes We trained the model to predict a sequence of 8 consecutive unseen future states from 6 frames of input video. Our prediction loss was a normalized weighted sum of the corresponding 8 error terms. The model was trained by backpropagation with an Adam optimizer [15]. See the Supplementary Material for full training parameters.