Learning to Count Objects in Natural Images for Visual Question Answering

Authors: Yan Zhang, Jonathon Hare, Adam PrĂ¼gel-Bennett

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model.
Researcher Affiliation Academia Yan Zhang & Jonathon Hare & Adam Pr ugel-Bennett Department of Electronics and Computer Science University of Southampton {yz5n12,jsh2,apb}@ecs.soton.ac.uk
Pseudocode No The paper describes the algorithmic steps and equations within the text, but does not present a formal pseudocode block or a clearly labeled algorithm figure.
Open Source Code Yes Our implementation is available at https://github.com/Cyanogenoid/vqa-counting.
Open Datasets Yes On the number category of the VQA v2 Open-Ended dataset (Goyal et al., 2017), a relatively simple baseline model using the counting component outperforms all previous models...
Dataset Splits Yes The model is trained for 100 epochs (1697 iterations per epoch to train on the training set, 2517 iterations per epoch to train on both training and validation sets) instead of 100,000 iterations, roughly in line with the doubling of dataset size when going from VQA v1 to VQA v2.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper mentions several techniques and components (e.g., Adam, LSTM, GRU, Batch Normalization) and cites their original papers, but does not specify software versions for programming languages, libraries, or frameworks used (e.g., Python version, PyTorch/TensorFlow version).
Experiment Setup Yes They are trained with crossentropy loss for 1000 iterations using Adam (Kingma & Ba, 2015) with a learning rate of 0.01 and a batch size of 1024. The learning rate is increased from 0.001 to 0.0015 and the batch size is doubled to 256. The model is trained for 100 epochs (1697 iterations per epoch to train on the training set, 2517 iterations per epoch to train on both training and validation sets) instead of 100,000 iterations...