Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Compositional Generalization from First Principles
Authors: Thaddäus Wiedemer, Prasanna Mayilvahanan, Matthias Bethge, Wieland Brendel
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theory in a range of synthetic experiments and perform several ablation studies that relate our findings to empirical methods (Section 4).We validate our theoretical framework on the multi-sprite data. All models were trained for 2000 epochs on training sets of 100k samples using an NVIDIA RTX 2080 Ti; all test sets contain 10k samples. Table 1 summarizes the reconstruction quality achieved on the in-domain (ID) test set (P) and the entire latent space (Q) for all experiments. |
| Researcher Affiliation | Academia | 1University of Tübingen 2Tübingen AI Center 3Max-Planck-Institute for Intelligent Systems, Tübingen |
| Pseudocode | No | The paper includes schematics of models in Figure 6, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code available at https://github.com/brendel-group/compositional-ood-generalization |
| Open Datasets | Yes | We validate our theoretical framework on the multi-sprite data.We additionally conduct experiments on the CLEVR dataset [35], a popular benchmark for compositional generalization and object-centric learning. |
| Dataset Splits | No | The paper mentions training sets of 100k samples and test sets of 10k samples, and for CLEVR, setting aside 10% of ID samples for evaluation. However, it does not explicitly define a separate validation dataset split or a specific methodology for it. |
| Hardware Specification | Yes | All models were trained for 2000 epochs on training sets of 100k samples using an NVIDIA RTX 2080 Ti |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or TensorFlow versions). |
| Experiment Setup | Yes | All models were trained for 2000 epochs on training sets of 100k samples...For training stability, the composition function is implemented as a soft pixel-wise addition using the sigmoid function σ( ) as x = σ( x1) x1 + σ( x1) x2.Both models are trained on samples (z, x) from the training set using an MSE reconstruction loss. |