Object-Centric Learning with Slot Attention

Authors: Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The goal of this section is to evaluate the Slot Attention module on two object-centric tasks one being supervised and the other one being unsupervised as described in Sections 2.2 and 2.3. We compare against specialized state-of-the-art methods [16, 17, 31] for each respective task.
Researcher Affiliation Collaboration 1Google Research, Brain Team 2Dept. of Computer Science, ETH Zurich 3Max-Planck Institute for Intelligent Systems
Pseudocode Yes Algorithm 1 Slot Attention module. The input is a set of N vectors of dimension Dinputs which is mapped to a set of K slots of dimension Dslots. We initialize the slots by sampling their initial values as independent samples from a Gaussian distribution with shared, learnable parameters µ RDslots and σ RDslots. In our experiments we set the number of iterations to T = 3.
Open Source Code Yes An implementation of Slot Attention is available at: https://github.com/google-research/ google-research/tree/master/slot_attention.
Open Datasets Yes For the object discovery experiments, we use the following three multi-object datasets [83]: CLEVR (with masks), Multi-d Sprites, and Tetrominoes. ... [83] Rishabh Kabra, Chris Burgess, Loic Matthey, Raphael Lopez Kaufman, Klaus Greff, Malcolm Reynolds, and Alexander Lerchner. Multi-object datasets. https://github.com/deepmind/multi_object_datasets/, 2019.
Dataset Splits Yes For set prediction, we use the original CLEVR dataset [84] which contains a training-validation split of 70K and 15K images of rendered objects respectively.
Hardware Specification Yes On CLEVR6, we can use a batch size of up to 64 on a single V100 GPU with 16GB of RAM as opposed to 4 in [16] using the same type of hardware. ... The Slot Attention model is trained using a single NVIDIA Tesla V100 GPU with 16GB of RAM.
Software Dependencies No The paper does not specify version numbers for software dependencies such as libraries or frameworks.
Experiment Setup Yes We train the model using the Adam optimizer [85] with a learning rate of 4 10 4 and a batch size of 64 (using a single GPU). ... At training time, we use T = 3 iterations of Slot Attention. We use the same training setting across all datasets, apart from the number of slots K: we use K = 7 slots for CLEVR6, K = 6 slots for Multi-d Sprites (max. 5 objects per scene), and K = 4 for Tetrominoes (3 objects per scene).