Compositional generalization through abstract representations in human and artificial neural networks
Authors: Takuya Ito, Tim Klinger, Doug Schultz, John Murray, Michael Cole, Mattia Rigotti
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we study the computational properties associated with compositional generalization in both humans and artificial neural networks (ANNs) on a highly compositional task. First, we identified behavioral signatures of compositional generalization in humans, along with their neural correlates using whole-cortex functional magnetic resonance imaging (f MRI) data. Next, we designed pretraining paradigms aided by a procedure we term primitives pretraining to endow compositional task elements into ANNs. We found that ANNs with this prior knowledge had greater correspondence with human behavior and neural compositional signatures. |
| Researcher Affiliation | Collaboration | Takuya Ito Yale University taku.ito1@gmail.com Tim Klinger IBM Research AI tklinger@us.ibm.com Douglas H. Schultz University of Nebraska-Lincoln dhschultz@unl.edu John D. Murray Yale University john.murray@yale.edu Michael W. Cole Rutgers University michael.cole@rutgers.edu Mattia Rigotti IBM Research AI mr2666@columbia.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'Public data URL is written in the text (Section 2.1)'. Section 2.1 provides a link to an fMRI dataset, not the source code for the methodology presented in the paper. No other explicit statement or link for source code is provided. |
| Open Datasets | Yes | The f MRI dataset is publicly available here: https://openneuro.org/datasets/ds003701 |
| Dataset Splits | No | The paper describes a sequential learning paradigm where an initial set of 4 practiced contexts is used for training, and then novel contexts are incrementally added. It refers to a 'training set' and 'test set', but does not explicitly define a separate 'validation' dataset split with specific percentages or counts for hyperparameter tuning. |
| Hardware Specification | No | The paper's self-assessment checklist states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]'. However, the main text or appendices do not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of 'Adam optimizer [27]' but does not specify version numbers for this or any other software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used in the implementation. |
| Experiment Setup | Yes | The primary ANN architecture had two hidden layers (128 units each) and an output layer that was comprised of four units that corresponded to each motor response (Fig. 4; see Appendix section A.7 for additional details). Training used a cross-entropy loss function and the Adam optimizer [27]. The ANN transformed a 28-element input vector into a 4-element response vector with the equation Y = f Re LU(Xh Wh + bh). Weights and biases were initialized from a uniform distribution U(1/k), where k is the number of input features from the previous layer. |