reproducibilityindex.ai

Supervising the Transfer of Reasoning Patterns in VQA

Authors: Corentin Kervadec, Christian Wolf, Grigory Antipov, Moez Baccouche, Madiha Nadri

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also demonstrate the effectiveness of this approach experimentally on the GQA dataset and show its complementarity to BERT-like self-supervised pre-training. and 5 Experimental results
Researcher Affiliation	Collaboration	1Orange Innovation, France 2LIRIS, INSA-Lyon, France 3LAGEPP, Université de Lyon, France
Pseudocode	No	The paper describes the method's steps and components (e.g., program decoder), but it does not include any formal pseudocode blocks or algorithm listings.
Open Source Code	No	We do not include the code, but we provide instructions needed to reproduce our experimental results in Section 3
Open Datasets	Yes	We also demonstrate the effectiveness of this approach experimentally on the GQA dataset and We use ground truth information from the GQA [15] dataset and Evaluation: is performed on GQA [15] and GQA-OOD [18] test sets.
Dataset Splits	Yes	Our models are trained on the balanced GQA [15] training set ( 1M question-answer pairs). and Hyper-parameters are selected either on the test-dev (for GQA) or validation (for GQA-OOD) sets. and Evaluation: is performed on GQA [15] (test-dev and test-std) and GQA-OOD [18] test sets.
Hardware Specification	No	The hardware specifications are stated to be in the supplementary material, not directly in the main paper: 'See supp. mat.'
Software Dependencies	No	The paper mentions various models and architectures (e.g., LXMERT, BERT, faster-RCNN, Vin VL, GRU) that might imply software, but it does not specify any software names with version numbers required to reproduce the experiments (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup	Yes	Hyper-parameters are selected either on the test-dev (for GQA) or validation (for GQA-OOD) sets. and we perform our experiments with a compact version of the Vision Language (VL)-Tansformer used in [30] (cf. Fig. 2), with a hidden embedding size of d=128 and h=4 heads per layer (only 26M trainable parameters). and we use faster-RCNN [25] with 36 objects per-images.