On the generalization capacity of neural networks during generic multimodal reasoning
Authors: Takuya Ito, Soham Dan, Mattia Rigotti, James Kozloski, Murray Campbell
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To assess the generality of this class of models and a variety of other base neural network architectures to multimodal domains, we evaluated and compared their capacity for multimodal generalization. We introduce a multimodal question-answer benchmark to evaluate three specific types of out-of-distribution (OOD) generalization performance: distractor generalization (generalization in the presence of distractors), systematic compositional generalization (generalization to new task permutations), and productive compositional generalization (generalization to more complex tasks structures). |
| Researcher Affiliation | Industry | Takuya Ito, Soham Dan, Mattia Rigotti, James Kozloski, & Murray Campbell T.J. Watson Research Center, IBM Research {takuya.ito,soham.dan}@ibm.com, mrg@zurich.ibm.com {kozloski,mcam}@us.ibm.com |
| Pseudocode | No | The paper describes its models and experimental procedures in detail but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code for this paper and dataset can be found at https://github.com/IBM/gcog. |
| Open Datasets | Yes | We introduce Generic COG (g COG), a task abstracted from the previous COG task (Yang et al., 2018). ... Dataset: https: //github.com/IBM/gcog |
| Dataset Splits | Yes | We evaluated distractor generalization on an independent and identically distributed (IID) split and an OOD split. ... Stimuli in the training set were randomly generated with a minimum of one distractor and a maximum of five distractors. ... Here, we trained on a subset of task trees of depth 1 and 3, and then evaluated performance on an a novel combination of task structures of depth 3 (Fig. 4d). ... We trained all models on task trees of depth 1 and depth 3, and then evaluated generalization performance to task trees of depth 5 and depth 7 (Fig. 5a). |
| Hardware Specification | Yes | All models could be trained in under three days on an NVIDIA K80 GPU, and were trained on IBM s Cognitive Compute Cluster. |
| Software Dependencies | Yes | Models were constructed using Py Torch version 2.0.0+cu118. |
| Experiment Setup | Yes | All models were trained using the Adam W optimizer with a learning rate of 0.0001, and the loss was computed as the Cross Entropy between the target class and output vector. |