Supervising the Transfer of Reasoning Patterns in VQA
Authors: Corentin Kervadec, Christian Wolf, Grigory Antipov, Moez Baccouche, Madiha Nadri
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also demonstrate the effectiveness of this approach experimentally on the GQA dataset and show its complementarity to BERT-like self-supervised pre-training. and 5 Experimental results |
| Researcher Affiliation | Collaboration | 1Orange Innovation, France 2LIRIS, INSA-Lyon, France 3LAGEPP, Université de Lyon, France |
| Pseudocode | No | The paper describes the method's steps and components (e.g., program decoder), but it does not include any formal pseudocode blocks or algorithm listings. |
| Open Source Code | No | We do not include the code, but we provide instructions needed to reproduce our experimental results in Section 3 |
| Open Datasets | Yes | We also demonstrate the effectiveness of this approach experimentally on the GQA dataset and We use ground truth information from the GQA [15] dataset and Evaluation: is performed on GQA [15] and GQA-OOD [18] test sets. |
| Dataset Splits | Yes | Our models are trained on the balanced GQA [15] training set ( 1M question-answer pairs). and Hyper-parameters are selected either on the test-dev (for GQA) or validation (for GQA-OOD) sets. and Evaluation: is performed on GQA [15] (test-dev and test-std) and GQA-OOD [18] test sets. |
| Hardware Specification | No | The hardware specifications are stated to be in the supplementary material, not directly in the main paper: 'See supp. mat.' |
| Software Dependencies | No | The paper mentions various models and architectures (e.g., LXMERT, BERT, faster-RCNN, Vin VL, GRU) that might imply software, but it does not specify any software names with version numbers required to reproduce the experiments (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | Hyper-parameters are selected either on the test-dev (for GQA) or validation (for GQA-OOD) sets. and we perform our experiments with a compact version of the Vision Language (VL)-Tansformer used in [30] (cf. Fig. 2), with a hidden embedding size of d=128 and h=4 heads per layer (only 26M trainable parameters). and we use faster-RCNN [25] with 36 objects per-images. |