reproducibilityindex.ai

On the generalization capacity of neural networks during generic multimodal reasoning

Authors: Takuya Ito, Soham Dan, Mattia Rigotti, James Kozloski, Murray Campbell

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To assess the generality of this class of models and a variety of other base neural network architectures to multimodal domains, we evaluated and compared their capacity for multimodal generalization. We introduce a multimodal question-answer benchmark to evaluate three specific types of out-of-distribution (OOD) generalization performance: distractor generalization (generalization in the presence of distractors), systematic compositional generalization (generalization to new task permutations), and productive compositional generalization (generalization to more complex tasks structures).
Researcher Affiliation	Industry	Takuya Ito, Soham Dan, Mattia Rigotti, James Kozloski, & Murray Campbell T.J. Watson Research Center, IBM Research {takuya.ito,soham.dan}@ibm.com, mrg@zurich.ibm.com {kozloski,mcam}@us.ibm.com
Pseudocode	No	The paper describes its models and experimental procedures in detail but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code for this paper and dataset can be found at https://github.com/IBM/gcog.
Open Datasets	Yes	We introduce Generic COG (g COG), a task abstracted from the previous COG task (Yang et al., 2018). ... Dataset: https: //github.com/IBM/gcog
Dataset Splits	Yes	We evaluated distractor generalization on an independent and identically distributed (IID) split and an OOD split. ... Stimuli in the training set were randomly generated with a minimum of one distractor and a maximum of five distractors. ... Here, we trained on a subset of task trees of depth 1 and 3, and then evaluated performance on an a novel combination of task structures of depth 3 (Fig. 4d). ... We trained all models on task trees of depth 1 and depth 3, and then evaluated generalization performance to task trees of depth 5 and depth 7 (Fig. 5a).
Hardware Specification	Yes	All models could be trained in under three days on an NVIDIA K80 GPU, and were trained on IBM s Cognitive Compute Cluster.
Software Dependencies	Yes	Models were constructed using Py Torch version 2.0.0+cu118.
Experiment Setup	Yes	All models were trained using the Adam W optimizer with a learning rate of 0.0001, and the loss was computed as the Cross Entropy between the target class and output vector.