Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames

Authors: Ondrej Biza, Sjoerd Van Steenkiste, Mehdi S. M. Sajjadi, Gamaleldin Fathy Elsayed, Aravindh Mahendran, Thomas Kipf

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on a wide range of synthetic object discovery benchmarks namely Tetrominoes, CLEVRTex, Objects Room and Multi Shape Net, and show promising improvements on the challenging realworld Waymo Open dataset.
Researcher Affiliation Collaboration 1Northeastern University, Boston, MA, USA. 2Google Research.
Pseudocode Yes The pseudo-code of ISA-TSR is provided in Algorithm 1.
Open Source Code Yes + JAX/FLAX source code: https://github.com/google-research/googleresearch/tree/master/invariant_slot_attention
Open Datasets Yes We evaluate our method on a wide range of synthetic object discovery benchmarks namely Tetrominoes, CLEVRTex, Objects Room and Multi Shape Net, and show promising improvements on the challenging realworld Waymo Open dataset.
Dataset Splits Yes We conduct experiments in two settings: (1) where only a few (64 to 1024) training samples are available to the model, and (2) where training samples are biased to only have objects appear in the left side of the image and test samples may have objects in all positions.
Hardware Specification No The paper mentions
Software Dependencies No The paper mentions using JAX/FLAX, Adam optimizer, and a pre-trained Dense Prediction Transformer (DPT) but does not provide specific version numbers for these software components.
Experiment Setup Yes The model is trained using Adam (Kingma & Ba, 2015) with a learning rate of 4 10 4 on all datasets except for Waymo Open Depths, where we use 2 10 4. We use a learning rate warm-up going from 0 for 50k steps. The batch size is 64. All experiments use 11 slots except in Tetrominoes where we use 4 slots and Multi Shape Net where we use 5 slots.