Unlocking Slot Attention by Changing Optimal Transport Costs

Authors: Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method in slot attention (SA-MESH1) on two object detection and two unsupervised object discovery tasks (Section 5). We find that our optimal transport-based variants generally outperform slot attention. Crucially, SA-MESH almost always has the best results often by a significant margin.
Researcher Affiliation Collaboration 1Samsung SAIT AI Lab, Montreal 2University of Amsterdam 3Mila, Universit e de Montreal 4Canada CIFAR AI Chair 5TNO.
Pseudocode No The paper describes algorithms and methods using mathematical equations and textual descriptions, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We open-source all of our code https: //github.com/davzha/MESH and provide extra experimental details in Appendix F.
Open Datasets Yes CLEVR (Johnson et al., 2017) is a synthetic dataset... We evaluate on the Multi-d Sprites dataset... Additionally, we test on Clevr Tex (Karazija et al., 2021)... we build two variants of the CLEVRER video dataset (Yi et al., 2019).
Dataset Splits Yes We largely follow same training setup as DSPN and i DSPN (Zhang et al., 2019; 2022)... We closely follow the experimental setup described by Locatello et al. (2020).
Hardware Specification No The paper mentions that "This research was enabled in part by compute resources provided by Mila (mila.quebec), Calcul Qu ebec (calculquebec.ca), the Digital Research Alliance of Canada (alliancecan. ca)" but does not provide specific details about the CPU, GPU models, memory, or other hardware components used for the experiments.
Software Dependencies No The paper mentions using the "POT package (Flamary et al., 2021)" for the EMD algorithm but does not specify any version numbers for this or any other software dependencies like programming languages or deep learning frameworks.
Experiment Setup Yes We generate a dataset of 64,000 data points to train on, each being a multiset with five 32-dimensional objects... we set the number of slots to five. We train all models for 20 epochs with a batch size of 64 (1,000 steps each epoch)... we double the number of channels in the image encoder and decoder to 64, and we double the dimensions of the slots to 128 (with the MLP in slot attention having an intermediate dimension of 256). We again train all models for 530 epochs which correspond to around 330k gradient update steps in this case. The maximum number of objects in an image is 10, so we set the number of slots to 11... We use 8 slots.