Robust and Controllable Object-Centric Learning through Energy-based Models

Authors: Ruixiang ZHANG, Tong Che, Boris Ivanovic, Renhao Wang, Marco Pavone, Yoshua Bengio, Liam Paull

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS We quantitatively and qualitatively evaluate our proposed model on the task of unsupervised object discovery, with the goal of decomposing the visual scene into a set of objects without any human supervision. As we will show, our approach is able to consistently segment images into highly interpretable and meaningful object masks.
Researcher Affiliation Collaboration Nvidia Research Stanford University Mila, Universit e de Montr eal
Pseudocode Yes Algorithm 1: Training procedure of EGO for unsupervised object discovery.
Open Source Code Yes Source code necessary to reproduce results of our model is made available in the supplementary material.
Open Datasets Yes Datasets In line with previous state-of-the-art works on object discovery, we use the following three multi-object datasets (Kabra et al., 2019): CLEVR (Johnson et al., 2017), Multid Sprites (Matthey et al., 2017), and Tetrominoes (Greff et al., 2019).
Dataset Splits No No explicit mention of a validation set split was found. The text specifies only training and test data splits: 'the first 70K samples from the CLEVR-6 dataset and the first 60K samples from the Multi-d Sprites and Tetrominoes datasets are used for training. Evaluation is performed on 320 test data examples.'
Hardware Specification Yes We train our models on 8 Nvidia A100 GPUs, and the training time on CLEVR dataset is about 1 day, and within a few hours on Multi-d Sprites and Tetrominoes datasets.
Software Dependencies No We implemented our model in Jax (Bradbury et al., 2018) and Flax (Heek et al., 2020).
Experiment Setup Yes We use Dz = 64 for the latent variable dimension, as in baseline methods. We use K = 7 latent variables for CLEVR-6, K = 6 for Multi-d Sprites, and K = 4 for Tetrominoes, which is one more than the maximum number of objects in the corresponding datasets. In Langevin MCMC sampling, we set the step size ϵ = 0.1 and the number of Langevin steps T = 5. We train the model using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0002 for 500K iterations, with a batch size of 128.