Robust Visual Reasoning via Language Guided Neural Module Networks
Authors: Arjun Akula, Varun Jampani, Soravit Changpinyo, Song-Chun Zhu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on VQA and REF tasks demonstrate the effectiveness of our approach. Additionally, we propose a new challenging out-of-distribution test split for REF task, which we call C3Ref+, for explicitly evaluating the NMN s ability to generalize well to adversarial perturbations and unseen combinations of known concepts. Experiments on C3Ref+ further demonstrate the generalization capabilities of our approach. |
| Researcher Affiliation | Collaboration | Arjun R. Akula1, Varun Jampani2, Soravit Changpinyo2, Song-Chun Zhu3,4,5 1UCLA Center for Vision, Cognition, Learning, and Autonomy, 2Google Research 3Beijing Institute for General Artificial Intelligence (BIGAI), 4Tsinghua University, 5Peking University |
| Pseudocode | No | The paper describes algorithms and methods in prose and with mathematical formulas, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include a direct statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use CLEVR [25] as the VQA benchmark, consisting of synthetically generated image and question pairs. Specifically, it consists of 100K images and 860K questions. We train our model on CLEVR train split and evaluate the performance on its val and test splits. In addition, using the model trained on CLEVR, we evaluate the performance on CLOSURE benchmark [13]... We then report results on CLEVR-Ref+ [33]... |
| Dataset Splits | Yes | We train our model on CLEVR train split and evaluate the performance on its val and test splits. We employ early stopping based on validation set accuracy. While reporting accuracies on S-Ref test split, we use the model trained on S-Ref train split. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., specific GPU or CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions using Adam [28] as an optimizer, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train PG and the execution engine using Adam [28] with learning rates 0.0005 and 0.0001, respectively. Our PG is trained for a maximum of 32K iterations, while EE is trained for a maximum of 450K iterations. We employ early stopping based on validation set accuracy. |