Robust and Controllable Object-Centric Learning through Energy-based Models
Authors: Ruixiang ZHANG, Tong Che, Boris Ivanovic, Renhao Wang, Marco Pavone, Yoshua Bengio, Liam Paull
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS We quantitatively and qualitatively evaluate our proposed model on the task of unsupervised object discovery, with the goal of decomposing the visual scene into a set of objects without any human supervision. As we will show, our approach is able to consistently segment images into highly interpretable and meaningful object masks. |
| Researcher Affiliation | Collaboration | Nvidia Research Stanford University Mila, Universit e de Montr eal |
| Pseudocode | Yes | Algorithm 1: Training procedure of EGO for unsupervised object discovery. |
| Open Source Code | Yes | Source code necessary to reproduce results of our model is made available in the supplementary material. |
| Open Datasets | Yes | Datasets In line with previous state-of-the-art works on object discovery, we use the following three multi-object datasets (Kabra et al., 2019): CLEVR (Johnson et al., 2017), Multid Sprites (Matthey et al., 2017), and Tetrominoes (Greff et al., 2019). |
| Dataset Splits | No | No explicit mention of a validation set split was found. The text specifies only training and test data splits: 'the first 70K samples from the CLEVR-6 dataset and the first 60K samples from the Multi-d Sprites and Tetrominoes datasets are used for training. Evaluation is performed on 320 test data examples.' |
| Hardware Specification | Yes | We train our models on 8 Nvidia A100 GPUs, and the training time on CLEVR dataset is about 1 day, and within a few hours on Multi-d Sprites and Tetrominoes datasets. |
| Software Dependencies | No | We implemented our model in Jax (Bradbury et al., 2018) and Flax (Heek et al., 2020). |
| Experiment Setup | Yes | We use Dz = 64 for the latent variable dimension, as in baseline methods. We use K = 7 latent variables for CLEVR-6, K = 6 for Multi-d Sprites, and K = 4 for Tetrominoes, which is one more than the maximum number of objects in the corresponding datasets. In Langevin MCMC sampling, we set the step size ϵ = 0.1 and the number of Langevin steps T = 5. We train the model using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0002 for 500K iterations, with a batch size of 128. |