Object-Centric Learning with Slot Attention
Authors: Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The goal of this section is to evaluate the Slot Attention module on two object-centric tasks one being supervised and the other one being unsupervised as described in Sections 2.2 and 2.3. We compare against specialized state-of-the-art methods [16, 17, 31] for each respective task. |
| Researcher Affiliation | Collaboration | 1Google Research, Brain Team 2Dept. of Computer Science, ETH Zurich 3Max-Planck Institute for Intelligent Systems |
| Pseudocode | Yes | Algorithm 1 Slot Attention module. The input is a set of N vectors of dimension Dinputs which is mapped to a set of K slots of dimension Dslots. We initialize the slots by sampling their initial values as independent samples from a Gaussian distribution with shared, learnable parameters µ RDslots and σ RDslots. In our experiments we set the number of iterations to T = 3. |
| Open Source Code | Yes | An implementation of Slot Attention is available at: https://github.com/google-research/ google-research/tree/master/slot_attention. |
| Open Datasets | Yes | For the object discovery experiments, we use the following three multi-object datasets [83]: CLEVR (with masks), Multi-d Sprites, and Tetrominoes. ... [83] Rishabh Kabra, Chris Burgess, Loic Matthey, Raphael Lopez Kaufman, Klaus Greff, Malcolm Reynolds, and Alexander Lerchner. Multi-object datasets. https://github.com/deepmind/multi_object_datasets/, 2019. |
| Dataset Splits | Yes | For set prediction, we use the original CLEVR dataset [84] which contains a training-validation split of 70K and 15K images of rendered objects respectively. |
| Hardware Specification | Yes | On CLEVR6, we can use a batch size of up to 64 on a single V100 GPU with 16GB of RAM as opposed to 4 in [16] using the same type of hardware. ... The Slot Attention model is trained using a single NVIDIA Tesla V100 GPU with 16GB of RAM. |
| Software Dependencies | No | The paper does not specify version numbers for software dependencies such as libraries or frameworks. |
| Experiment Setup | Yes | We train the model using the Adam optimizer [85] with a learning rate of 4 10 4 and a batch size of 64 (using a single GPU). ... At training time, we use T = 3 iterations of Slot Attention. We use the same training setting across all datasets, apart from the number of slots K: we use K = 7 slots for CLEVR6, K = 6 slots for Multi-d Sprites (max. 5 objects per scene), and K = 4 for Tetrominoes (3 objects per scene). |