reproducibilityindex.ai

SCALOR: Generative World Models with Scalable Object Representations

Authors: Jindong Jiang*, Sepehr Janghorbani*, Gerard De Melo, Sungjin Ahn

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we describe the experiments conducted to empirically evaluate the performance of SCALOR. We propose two tasks, (i) synthetic MNIST/d Sprites shapes and (ii) natural-scene CCTV footage of walking pedestrians. We will show SCALOR s abilities to detect and track objects, to generate future trajectories, and to generalize to unseen settings. Furthermore, we provide a quantitative comparison to state-of-the-art baselines.
Researcher Affiliation	Academia	Jindong Jiang , Sepehr Janghorbani , Gerard de Melo & Sungjin Ahn Rutgers University
Pseudocode	Yes	A ALGORITHMS Algorithm 1: Discovery Proposal-Rejection Inference Algorithm 2: Propagation Inference Algorithm 3: Background Module and Rendering
Open Source Code	No	Full details of the architecture will be released along with our code.
Open Datasets	Yes	We ﬁrst evaluate our model on datasets of moving d Sprites shapes as well as moving MNIST digits. ... Specifically, we consider the Crowded Grand Central Station dataset (Zhou et al., 2012)...
Dataset Splits	No	For natural-scene experiments, we spatially split the video into 8 parts and create a dataset of 400k frames in total. We choose the ﬁrst 360k frames for training and 40k frames for testing. No explicit mention of a validation split was found.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory) were provided for the experimental setup.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library or solver names with versions) were provided.
Experiment Setup	Yes	We choose a batch size of 20 for the natural scene experiments and a batch size of 16 for MNIST/d Sprites experiments. The learning rate is ﬁxed at 4e-5 for natural image experiments and 5e-4 for d Sprites/MNIST experiments. We use RMSprop for optimization during training. The standard deviation of the image distribution is chosen to be 0.1 for natural experiments and 0.2 for toy experiments. The prior for all Gaussian posteriors is set to standard normal. For the pedestrian tracking dataset, we constrain the range of zscale so that the inferred width can vary from 5.2 pixels to 11.7 pixels, and the height can vary from 12.0 to 28.8, and both with a prior of the middle value in discovery. Similarly, we constrain zscale on synthetic datasets so that it can vary from half to 1.5 times the actual object size. The zpos variable in the propagation phase is modeled as the deviation of the position from the previous time-step instead of the global coordinate. The prior for zpres in discovery is set to be 0.1 at the beginning of training and to quickly anneal to 1e-3 for natural image experiments and 1e-4 for d Sprites/MNIST experiments. The temperature used for modelling zpres is set to be 1.0 at the beginning and anneal to 0.3 after 20k iterations.