reproducibilityindex.ai

Robust and Controllable Object-Centric Learning through Energy-based Models

Authors: Ruixiang ZHANG, Tong Che, Boris Ivanovic, Renhao Wang, Marco Pavone, Yoshua Bengio, Liam Paull

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS We quantitatively and qualitatively evaluate our proposed model on the task of unsupervised object discovery, with the goal of decomposing the visual scene into a set of objects without any human supervision. As we will show, our approach is able to consistently segment images into highly interpretable and meaningful object masks.
Researcher Affiliation	Collaboration	Nvidia Research Stanford University Mila, Universit e de Montr eal
Pseudocode	Yes	Algorithm 1: Training procedure of EGO for unsupervised object discovery.
Open Source Code	Yes	Source code necessary to reproduce results of our model is made available in the supplementary material.
Open Datasets	Yes	Datasets In line with previous state-of-the-art works on object discovery, we use the following three multi-object datasets (Kabra et al., 2019): CLEVR (Johnson et al., 2017), Multid Sprites (Matthey et al., 2017), and Tetrominoes (Greff et al., 2019).
Dataset Splits	No	No explicit mention of a validation set split was found. The text specifies only training and test data splits: 'the first 70K samples from the CLEVR-6 dataset and the first 60K samples from the Multi-d Sprites and Tetrominoes datasets are used for training. Evaluation is performed on 320 test data examples.'
Hardware Specification	Yes	We train our models on 8 Nvidia A100 GPUs, and the training time on CLEVR dataset is about 1 day, and within a few hours on Multi-d Sprites and Tetrominoes datasets.
Software Dependencies	No	We implemented our model in Jax (Bradbury et al., 2018) and Flax (Heek et al., 2020).
Experiment Setup	Yes	We use Dz = 64 for the latent variable dimension, as in baseline methods. We use K = 7 latent variables for CLEVR-6, K = 6 for Multi-d Sprites, and K = 4 for Tetrominoes, which is one more than the maximum number of objects in the corresponding datasets. In Langevin MCMC sampling, we set the step size ϵ = 0.1 and the number of Langevin steps T = 5. We train the model using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0002 for 500K iterations, with a batch size of 128.