Modulating early visual processing by language
Authors: Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, Aaron C. Courville
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply CBN to a pre-trained Residual Network (Res Net), leading to the MODulat Ed Res Net (MODERN) architecture, and show that this significantly improves strong baselines on two visual question answering tasks. Our ablation study confirms that modulating from the early stages of the visual processing is beneficial. |
| Researcher Affiliation | Collaboration | Harm de Vries University of Montreal mail@harmdevries.com Florian Strub Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRISt AL florian.strub@inria.fr Jérémie Mary Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRISt AL jeremie.mary@univ-lille3.fr Hugo Larochelle Google Brain hugolarochelle@google.com Olivier Pietquin Deep Mind pietquin@google.com Aaron Courville University of Montreal aaron.courville@gmail.com |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | The source code for our experiments is available at https://github.com/Guess What Game. |
| Open Datasets | Yes | In this paper, we focus on VQAv1 dataset [1], which contains 614K questions on 204K images. |
| Dataset Splits | Yes | We train on the training set, do early-stopping on the validation set, and report the accuracies on the test-dev using the evaluation script provided by [1]. |
| Hardware Specification | Yes | We thank NVIDIA for providing access to a DGX-1 machine used in this work. |
| Software Dependencies | No | The paper mentions software components like LSTM, GRU, and ResNet, but does not provide specific version numbers for any software libraries or frameworks used. |
| Experiment Setup | Yes | The hyperparameters are also provided in Appendix A. |