Compositional generalization through abstract representations in human and artificial neural networks

Authors: Takuya Ito, Tim Klinger, Doug Schultz, John Murray, Michael Cole, Mattia Rigotti

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we study the computational properties associated with compositional generalization in both humans and artificial neural networks (ANNs) on a highly compositional task. First, we identified behavioral signatures of compositional generalization in humans, along with their neural correlates using whole-cortex functional magnetic resonance imaging (f MRI) data. Next, we designed pretraining paradigms aided by a procedure we term primitives pretraining to endow compositional task elements into ANNs. We found that ANNs with this prior knowledge had greater correspondence with human behavior and neural compositional signatures.
Researcher Affiliation Collaboration Takuya Ito Yale University taku.ito1@gmail.com Tim Klinger IBM Research AI tklinger@us.ibm.com Douglas H. Schultz University of Nebraska-Lincoln dhschultz@unl.edu John D. Murray Yale University john.murray@yale.edu Michael W. Cole Rutgers University michael.cole@rutgers.edu Mattia Rigotti IBM Research AI mr2666@columbia.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper states: 'Public data URL is written in the text (Section 2.1)'. Section 2.1 provides a link to an fMRI dataset, not the source code for the methodology presented in the paper. No other explicit statement or link for source code is provided.
Open Datasets Yes The f MRI dataset is publicly available here: https://openneuro.org/datasets/ds003701
Dataset Splits No The paper describes a sequential learning paradigm where an initial set of 4 practiced contexts is used for training, and then novel contexts are incrementally added. It refers to a 'training set' and 'test set', but does not explicitly define a separate 'validation' dataset split with specific percentages or counts for hyperparameter tuning.
Hardware Specification No The paper's self-assessment checklist states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]'. However, the main text or appendices do not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions the use of 'Adam optimizer [27]' but does not specify version numbers for this or any other software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used in the implementation.
Experiment Setup Yes The primary ANN architecture had two hidden layers (128 units each) and an output layer that was comprised of four units that corresponded to each motor response (Fig. 4; see Appendix section A.7 for additional details). Training used a cross-entropy loss function and the Adam optimizer [27]. The ANN transformed a 28-element input vector into a 4-element response vector with the equation Y = f Re LU(Xh Wh + bh). Weights and biases were initialized from a uniform distribution U(1/k), where k is the number of input features from the previous layer.