reproducibilityindex.ai

Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Authors: Raeid Saqur, Karthik Narasimhan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate MGN on two tasks a binary classiﬁcation task of predicting if a caption matches an image based on attribute compositions in the CLEVR dataset [28], and CLOSURE [6] a recently released challenge for testing systematic generalization in language.
Researcher Affiliation	Collaboration	1University of Toronto Computer Science 2Princeton University, Computer Science 3Vector Institute for Artiﬁcial Intelligence raeidsaqur@cs.[toronto\|princeton].edu Karthik Narasimhan Department of Computer Science Princeton University karthikn@cs.princeton.edu
Pseudocode	No	The paper describes processes and architectures but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/raeidsaqur/mgn
Open Datasets	Yes	We use images from the CLEVR dataset [28] and use their template generator to produce captions that are both true and false. The original dataset contains 1M questions generated from 100k questions with 90 question template families...
Dataset Splits	Yes	All models were trained using Adam with a learning rate of 5 10 4, a batch size of 64 for a maximum of 360k iterations, with early stopping based on validation accuracy.
Hardware Specification	No	No specific hardware (e.g., GPU/CPU models, memory details) used for running experiments was mentioned in the paper.
Software Dependencies	No	The paper mentions "Py Torch Geometric [13]" and the "en_core_web_sm 3 LM", but does not provide specific version numbers for PyTorch, SpaCy, or PyTorch Geometric itself.
Experiment Setup	Yes	All models were trained using Adam with a learning rate of 5 10 4, a batch size of 64 for a maximum of 360k iterations, with early stopping based on validation accuracy. ... A learning rate of 0.01 with weight decay 5 10 4 was used with the cross-entropy loss function. ... Both the encoder and decoder have hidden layers with a 256-dim hidden vector. We set the dimensions of both the encoder and decoder word vectors to be 300, and the multimodal graph vector representation to be 100. ... We use a learning rate of 1 10 5 and a batch size of 64 for a maximum of 1,000,000 iterations.