Multi-Object Representation Learning with Iterative Variational Inference
Authors: Klaus Greff, Raphaël Lopez Kaufman, Rishabh Kabra, Nick Watters, Christopher Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander Lerchner
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on three main datasets: 1) CLEVR (Johnson et al., 2017) and a variant CLEVR6 which uses only scenes with up to 6 objects, 2) a multi-object version of the d Sprites dataset (Matthey et al., 2017), and 3) a dataset of multiple Tetris -like pieces that we created. In all cases we train the system using the Adam optimizer (Kingma & Ba, 2015) to minimize the negative ELBO for 10^6 updates. To quantify segmentation quality, we measure the similarity between ground-truth (instance) segmentations and our predicted object masks using the Adjusted Rand Index (ARI; Rand 1971; Hubert & Arabie 1985). As shown in Table 1, IODINE achieves almost perfect ARI scores of around 0.99 for CLEVR6, and Tetris as well as a relatively good score of 0.77 for Multi-d Sprites. Input Ablations We ablated each of the different inputs to the refinement network described in Section 2.2. |
| Researcher Affiliation | Collaboration | Klaus Greff 1 2 Raphaël Lopez Kaufman 3 Rishabh Kabra 3 Nick Watters 3 Chris Burgess 3 Daniel Zoran 3 Loic Matthey 3 Matthew Botvinick 3 Alexander Lerchner 3 1The Swiss AI lab IDSIA, Lugano, Switzerland 2Work done at Deep Mind 3Deep Mind, London, UK. Correspondence to: Klaus Greff <klaus.greff@startmail.com>. |
| Pseudocode | Yes | Algorithm 1 IODINE Pseudocode. |
| Open Source Code | No | Explanation: The paper does not provide an explicit statement about open-sourcing the code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We evaluate our model on three main datasets: 1) CLEVR (Johnson et al., 2017) and a variant CLEVR6 which uses only scenes with up to 6 objects, 2) a multi-object version of the d Sprites dataset (Matthey et al., 2017), and 3) a dataset of multiple Tetris -like pieces that we created. |
| Dataset Splits | No | Explanation: The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages or sample counts) in the main text needed to reproduce the experiment. It mentions using specific datasets but not the partitioning details. |
| Hardware Specification | No | Explanation: The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | Explanation: The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We varied several hyperparameters, including: number of slots, dimensionality of zk, number of inference iterations, number of convolutional layers and their filter sizes, batch size, and learning rate. For details of the models and hyperparameters refer to Appendix C. |