Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames
Authors: Ondrej Biza, Sjoerd Van Steenkiste, Mehdi S. M. Sajjadi, Gamaleldin Fathy Elsayed, Aravindh Mahendran, Thomas Kipf
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a wide range of synthetic object discovery benchmarks namely Tetrominoes, CLEVRTex, Objects Room and Multi Shape Net, and show promising improvements on the challenging realworld Waymo Open dataset. |
| Researcher Affiliation | Collaboration | 1Northeastern University, Boston, MA, USA. 2Google Research. |
| Pseudocode | Yes | The pseudo-code of ISA-TSR is provided in Algorithm 1. |
| Open Source Code | Yes | + JAX/FLAX source code: https://github.com/google-research/googleresearch/tree/master/invariant_slot_attention |
| Open Datasets | Yes | We evaluate our method on a wide range of synthetic object discovery benchmarks namely Tetrominoes, CLEVRTex, Objects Room and Multi Shape Net, and show promising improvements on the challenging realworld Waymo Open dataset. |
| Dataset Splits | Yes | We conduct experiments in two settings: (1) where only a few (64 to 1024) training samples are available to the model, and (2) where training samples are biased to only have objects appear in the left side of the image and test samples may have objects in all positions. |
| Hardware Specification | No | The paper mentions |
| Software Dependencies | No | The paper mentions using JAX/FLAX, Adam optimizer, and a pre-trained Dense Prediction Transformer (DPT) but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The model is trained using Adam (Kingma & Ba, 2015) with a learning rate of 4 10 4 on all datasets except for Waymo Open Depths, where we use 2 10 4. We use a learning rate warm-up going from 0 for 50k steps. The batch size is 64. All experiments use 11 slots except in Tetrominoes where we use 4 slots and Multi Shape Net where we use 5 slots. |