Contrastive Learning of Structured World Models

Authors: Thomas Kipf, Elise van der Pol, Max Welling

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate C-SWMs on compositional environments involving multiple interacting objects that can be manipulated independently by an agent, simple Atari games, and a multi-object physics simulation. Our experiments demonstrate that C-SWMs can overcome limitations of models based on pixel reconstruction and outperform typical representatives of this model class in highly structured environments, while learning interpretable object-based representations.
Researcher Affiliation Academia Thomas Kipf University of Amsterdam t.n.kipf@uva.nl Elise van der Pol University of Amsterdam Uv A-Bosch Delta Lab e.e.vanderpol@uva.nl Max Welling University of Amsterdam CIFAR m.welling@uva.nl
Pseudocode No No explicit pseudocode block or algorithm figure was provided. The methodology is described in text and mathematical equations.
Open Source Code Yes Our implementation is available under https://github.com/tkipf/c-swm.
Open Datasets Yes We make use of the Arcade Learning Environment (Bellemare et al., 2013) to create a small environment based on the Atari 2600 game Pong... We use the PONGDETERMINISTICV4 variant of the environment in Open AI Gym (Brockman et al., 2016). ... This environment is adapted from Jaques et al. (2019) using their publicly available implementation1.
Dataset Splits No The paper describes distinct training and evaluation/test sets with specific episode counts (e.g., 'We train C-SWMs on an experience buffer... For evaluation, we populate a separate experience buffer...'). However, it does not explicitly mention a dedicated 'validation' set or its split percentage/count.
Hardware Specification Yes Both C-SWM and the World Model baseline trained for approx. 1 hour wall-clock time on the 2D shapes dataset, approx. 2 hours on the 3D cubes dataset, and approx. 30min on the 3-body physics environment using a single NVIDIA GTX1080Ti GPU. A notable exception is the PAIG baseline model (Jaques et al., 2019) which trained for approx. 6 hours on a NVIDIA Titan X Pascal GPU using the recommended settings by the authors of the paper.
Software Dependencies No The paper mentions software like Matplotlib and the Adam optimizer, but it does not specify version numbers for these or any other key software dependencies (e.g., Python, PyTorch version).
Experiment Setup Yes All models are trained for 100 epochs (200 for Atari games) using the Adam (Kingma & Ba, 2014) optimizer with a learning rate of 5 10 4 and a batch size of 1024 (512 for baselines with decoders due to higher memory demands, and 100 for PAIG as suggested by the authors). The margin in the hinge loss is chosen as γ = 1. We further multiply the squared Euclidean distance d(x, y) in the loss function with a factor of 0.5/σ2 with σ = 0.5 to control the spread of the embeddings.