Modulating early visual processing by language

Authors: Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, Aaron C. Courville

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply CBN to a pre-trained Residual Network (Res Net), leading to the MODulat Ed Res Net (MODERN) architecture, and show that this significantly improves strong baselines on two visual question answering tasks. Our ablation study confirms that modulating from the early stages of the visual processing is beneficial.
Researcher Affiliation Collaboration Harm de Vries University of Montreal mail@harmdevries.com Florian Strub Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRISt AL florian.strub@inria.fr Jérémie Mary Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRISt AL jeremie.mary@univ-lille3.fr Hugo Larochelle Google Brain hugolarochelle@google.com Olivier Pietquin Deep Mind pietquin@google.com Aaron Courville University of Montreal aaron.courville@gmail.com
Pseudocode No No explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes The source code for our experiments is available at https://github.com/Guess What Game.
Open Datasets Yes In this paper, we focus on VQAv1 dataset [1], which contains 614K questions on 204K images.
Dataset Splits Yes We train on the training set, do early-stopping on the validation set, and report the accuracies on the test-dev using the evaluation script provided by [1].
Hardware Specification Yes We thank NVIDIA for providing access to a DGX-1 machine used in this work.
Software Dependencies No The paper mentions software components like LSTM, GRU, and ResNet, but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup Yes The hyperparameters are also provided in Appendix A.