Object-Centric Learning with Slot Mixture Module

Authors: Daniil Kirilenko, Vitaliy Vorobyov, Alexey Kovalev, Aleksandr Panov

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct thorough experimental studies to substantiate the results obtained. Through extensive experiments, we show that the proposed Slot Mixture Module achieves the state-of-the-art performance in the set property prediction task on the CLEVR dataset (Johnson et al., 2017), outperforming even highly specialized models (Zhang et al., 2019). We provide experimental results for the image reconstruction task on five datasets: four with synthetic images (CLEVR-Mirror (Singh et al., 2022), Shape Stacks (Groth et al., 2018), Clevr Tex (Karazija et al., 2021), Bitmoji (Graux, 2021)) and one with real-life images (COCO-2017 (Lin et al., 2014)).
Researcher Affiliation Collaboration Daniil Kirilenko1,2, Vitaliy Vorobyov1, Alexey K. Kovalev1,3,4, Aleksandr I. Panov1,3,4 1 FRC CSC RAS, Moscow, Russia 2 Universit a della Svizzera italiana, Lugano, Switzerland 3 AIRI, Moscow, Russia 4 MIPT, Dolgoprudny, Russia daniil.kirilenko@usi.ch, kovalev@airi.net
Pseudocode Yes Algorithm 1 The Slot Mixture Module pseudocode. π is initialized as a uniform categorical distribution, µ and Σdiag are initialized from Gaussian distributions with trainable parameters.
Open Source Code Yes 1The code is available at https://github.com/AIRI-Institute/smm
Open Datasets Yes We consider the following datasets: CLEVR-Mirror (Singh et al., 2022), Clevr Tex (Karazija et al., 2021), Shape Stacks (Groth et al., 2018), and COCO-2017 (Lin et al., 2014). CLEVR-Mirror is an extension of the standard CLEVR dataset, which requires capturing global relations between local components due to the presence of a mirror. Shape Stacks tests the ability of the model to describe complex local interactions (multiple objects stacked on each other), and Clevr Tex examines the model s capabilities in textural-rich scenes.
Dataset Splits Yes In Appendix B, we provide validation cross-entropy curves during training.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments with specific models or processors.
Software Dependencies No The paper mentions software like PyTorch Ignite, Adam optimizer, One Cycle LR scheduler, but does not provide specific version numbers for these or other key software components.
Experiment Setup Yes Training conditions with hyperparameters corresponding to a certain dataset are taken from Singh et al. (2022), except that we use a batch size equal to 64 and 2.5 105 training iterations for all experiments. We use five iterations as a good default choice in the trade-off between best performance and excessive computation. Since the optimal number of steps may depend on the datasets and tasks, we did not perform a hyperparameter search, as this would lead to difficulties in comparing different tasks and datasets. We choose the average number of 5 based on the most related works: Locatello et al. (2020) uses 3, Singh et al. (2022) uses 7 for some datasets and 3 for others, Chang et al. (2022) shows the difference in performance between 1 and 7 iterations. Additional architectural details are presented in Appendix A.