Compositional Generalization from First Principles
Authors: Thaddäus Wiedemer, Prasanna Mayilvahanan, Matthias Bethge, Wieland Brendel
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theory in a range of synthetic experiments and perform several ablation studies that relate our findings to empirical methods (Section 4).We validate our theoretical framework on the multi-sprite data. All models were trained for 2000 epochs on training sets of 100k samples using an NVIDIA RTX 2080 Ti; all test sets contain 10k samples. Table 1 summarizes the reconstruction quality achieved on the in-domain (ID) test set (P) and the entire latent space (Q) for all experiments. |
| Researcher Affiliation | Academia | 1University of Tübingen 2Tübingen AI Center 3Max-Planck-Institute for Intelligent Systems, Tübingen |
| Pseudocode | No | The paper includes schematics of models in Figure 6, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code available at https://github.com/brendel-group/compositional-ood-generalization |
| Open Datasets | Yes | We validate our theoretical framework on the multi-sprite data.We additionally conduct experiments on the CLEVR dataset [35], a popular benchmark for compositional generalization and object-centric learning. |
| Dataset Splits | No | The paper mentions training sets of 100k samples and test sets of 10k samples, and for CLEVR, setting aside 10% of ID samples for evaluation. However, it does not explicitly define a separate validation dataset split or a specific methodology for it. |
| Hardware Specification | Yes | All models were trained for 2000 epochs on training sets of 100k samples using an NVIDIA RTX 2080 Ti |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or TensorFlow versions). |
| Experiment Setup | Yes | All models were trained for 2000 epochs on training sets of 100k samples...For training stability, the composition function is implemented as a soft pixel-wise addition using the sigmoid function σ( ) as x = σ( x1) x1 + σ( x1) x2.Both models are trained on samples (z, x) from the training set using an MSE reconstruction loss. |