On the generalization capacity of neural networks during generic multimodal reasoning

Authors: Takuya Ito, Soham Dan, Mattia Rigotti, James Kozloski, Murray Campbell

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To assess the generality of this class of models and a variety of other base neural network architectures to multimodal domains, we evaluated and compared their capacity for multimodal generalization. We introduce a multimodal question-answer benchmark to evaluate three specific types of out-of-distribution (OOD) generalization performance: distractor generalization (generalization in the presence of distractors), systematic compositional generalization (generalization to new task permutations), and productive compositional generalization (generalization to more complex tasks structures).
Researcher Affiliation Industry Takuya Ito, Soham Dan, Mattia Rigotti, James Kozloski, & Murray Campbell T.J. Watson Research Center, IBM Research {takuya.ito,soham.dan}@ibm.com, mrg@zurich.ibm.com {kozloski,mcam}@us.ibm.com
Pseudocode No The paper describes its models and experimental procedures in detail but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code for this paper and dataset can be found at https://github.com/IBM/gcog.
Open Datasets Yes We introduce Generic COG (g COG), a task abstracted from the previous COG task (Yang et al., 2018). ... Dataset: https: //github.com/IBM/gcog
Dataset Splits Yes We evaluated distractor generalization on an independent and identically distributed (IID) split and an OOD split. ... Stimuli in the training set were randomly generated with a minimum of one distractor and a maximum of five distractors. ... Here, we trained on a subset of task trees of depth 1 and 3, and then evaluated performance on an a novel combination of task structures of depth 3 (Fig. 4d). ... We trained all models on task trees of depth 1 and depth 3, and then evaluated generalization performance to task trees of depth 5 and depth 7 (Fig. 5a).
Hardware Specification Yes All models could be trained in under three days on an NVIDIA K80 GPU, and were trained on IBM s Cognitive Compute Cluster.
Software Dependencies Yes Models were constructed using Py Torch version 2.0.0+cu118.
Experiment Setup Yes All models were trained using the Adam W optimizer with a learning rate of 0.0001, and the loss was computed as the Cross Entropy between the target class and output vector.