Provable Compositional Generalization for Object-Centric Learning
Authors: Thaddäus Wiedemer, Jack Brady, Alexander Panfilov, Attila Juhos, Matthias Bethge, Wieland Brendel
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theoretical result and highlight the practical relevance of our assumptions through experiments on synthetic image data. In this work, we take a step towards addressing this point by investigating theoretically when compositional generalization is possible in object-centric representation learning. We use this to empirically verify our theoretical results in Sec. 6.1 and find that additive autoencoders that minimize our proposed regularizer on a multi-object dataset are able to generalize compositionally. In Sec. 6.2, we study the importance of our theoretical assumptions for the popular object-centric model Slot Attention (Locatello et al., 2020a) on this dataset. This section first verifies our main theoretical result (Thm. 3) on synthetic multi-object data (Sec. 6.1). We then ablate the impact of each of our theoretical assumptions on compositional generalization using the object-centric model Slot Attention (Locatello et al., 2020a) (Sec. 6.2). |
| Researcher Affiliation | Academia | 1University of T ubingen 2Max Planck Institute for Intelligent Systems 3T ubingen AI Center 4ELLIS Institute T ubingen |
| Pseudocode | No | The paper does not contain any blocks explicitly labeled as "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | Code at github.com/brendel-group/objects-compositional-generalization |
| Open Datasets | No | The multi-object sprites dataset used in all experiments was generated using Deep Mind s Spriteworld renderer (Watters et al., 2019). The training set consists of 100,000 samples, while the ID and OOD test set each consist of 5,000 samples. Each rendered image is of size 64 64 3. The paper describes the dataset generation process and mentions Spriteworld renderer with a citation, but does not provide a direct link or DOI to the specific generated dataset used, only the renderer itself. It also mentions generating a dataset based on specific sampling rules rather than using an existing publicly available one. |
| Dataset Splits | Yes | The training set and ID test set were then sampled uniformly from the resulting region, ZS, while the OOD test set was sampled uniformly from Z \ ZS. The resulting training set consists of 100,000 samples, while the ID and OOD test set each consist of 5,000 samples. |
| Hardware Specification | No | This research utilized compute resources at the T ubingen Machine Learning Cloud, DFG FKZ INST 37/1057-1 FUGG. The paper mentions a compute resource but does not specify any particular CPU, GPU, or memory details. |
| Software Dependencies | Yes | Both models were trained using Py Torch (Paszke et al., 2019). |
| Experiment Setup | Yes | The additive autoencoder is trained for 300 epochs, while Slot Attention is trained for 400 epochs. Both models are trained using a batch size of 64. We optimize both models with Adam W (Loshchilov and Hutter, 2019) with a warmup of eleven epochs. The initial learning rate is set as 1 10 7 and doubles every epoch until it reaches the value of 0.0004. Subsequently, the learning rate is halved every 50 epochs until it reaches 1 10 7. If used, the composition consistency loss is introduced with λ = 1 from epoch 100 onwards for the additive autoencoder model and epoch 150 onwards for Slot Attention. The number of recombined samples z in each forward pass of the consistency loss is equal to the batch size for all experiments. |