Generalization and Robustness Implications in Object-Centric Learning

Authors: Andrea Dittadi, Samuele S Papa, Michele De Vita, Bernhard Schölkopf, Ole Winther, Francesco Locatello

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we train stateof-the-art unsupervised models on five common multi-object datasets and evaluate segmentation metrics and downstream object property prediction. In addition, we study generalization and robustness by investigating the settings where either a single object is out of distribution e.g., having an unseen color, texture, or shape or global properties of the scene are altered e.g., by occlusions, cropping, or increasing the number of objects. From our experimental study, we find objectcentric representations to be useful for downstream tasks and generally robust to most distribution shifts affecting objects.
Researcher Affiliation Collaboration Andrea Dittadi 1 2 Samuele Papa 1 Michele De Vita 1 Bernhard Schölkopf 2 Ole Winther 1 3 4 Francesco Locatello 5 1Technical University of Denmark 2Max Planck Institute for Intelligent Systems, Tübingen, Germany 3University of Copenhagen 4Rigshospitalet, Copenhagen University Hospital 5Amazon.
Pseudocode No The paper describes the models and their architectures but does not include any pseudocode or algorithm blocks.
Open Source Code Yes As an additional contribution, we provide a library2 for benchmarking object-centric representation learning, which can be extended with more datasets, methods, and evaluation tasks. ... 2https://github.com/addtt/object-centric-library
Open Datasets Yes We then collect five popular multi-object datasets: Multi-d Sprites, Objects Room, and Tetrominoes from Deep Mind s Multi-Object Datasets collection (Kabra et al., 2019), CLEVR (Johnson et al., 2017), and Shapestacks (Groth et al., 2018).
Dataset Splits Yes For each dataset, we define train, validation, and test splits. The test splits, which always contain at least 2000 images, are exclusively used for evaluation. ... Table 7: Dataset splits, number of foreground and background objects, and number of slots used when training object-centric models. CLEVR6: Train 49483, Validation 2000, Test 2000.
Hardware Specification Yes Training and evaluating all the models for the main study requires approximately 1.44 GPU years on NVIDIA V100.
Software Dependencies No The paper mentions “PyTorch (Paszke et al., 2019)” as the implementation library in Appendix A.2, but it does not specify the version number for PyTorch or any other software dependency.
Experiment Setup Yes We provide implementation and training details for each model below. ... Table 2: Overview of the main hyperparameter values for MONet. When dataset-specific values are not given, the defaults are used. Optimizer Adam, Learning rate 1e-4, Batch size 64, Training steps 500k, β 0.5.