Visual Question Answering with Question Representation Update (QRU)
Authors: Ruiyu Li, Jiaya Jia
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method is evaluated on challenging datasets of COCO-QA [19] and VQA [2] and yields state-of-the-art performance. |
| Researcher Affiliation | Academia | Ruiyu Li Jiaya Jia The Chinese University of Hong Kong {ryli,leojia}@cse.cuhk.edu.hk |
| Pseudocode | No | The paper describes the model architecture and mathematical formulations (e.g., equations 1-7). However, it does not contain a dedicated pseudocode block, algorithm box, or clearly formatted algorithmic steps. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on COCO-QA [19] and VQA [2]. The COCO-QA dataset is based on Microsoft COCO image data [13]. There are 78,736 training questions and 38,948 test ones, based on a total of 123,287 images. In the VQA dataset, each image from the COCO data is annotated by Amazon Mechanical Turk (AMT) with three questions. There are 248,349, 121,512 and 244,302 questions for training, validation and testing, respectively. |
| Dataset Splits | Yes | For the COCO-QA dataset, we set the dimension of common latent space to 1,024. Since VQA dataset is larger than COCO-QA, we double the dimension of common latent space to adapt the data and classes. There are 248,349, 121,512 and 244,302 questions for training, validation and testing, respectively. |
| Hardware Specification | Yes | We thank NVIDIA for providing Ruiyu Li a Tesla K40 GPU accelerator for this work. |
| Software Dependencies | No | We implement our network using the public Torch computing framework. The paper mentions Torch but does not specify a version number or other software dependencies with their respective versions. |
| Experiment Setup | Yes | The network is trained in an end-to-end fashion using stochastic gradient descent with mini-batches of 100 samples and momentum 0.9. The learning rate starts from 10 3 and decreases by a factor of 10 when validation accuracy stops improving. We use dropout and gradient clipping to regularize the training process. For the COCO-QA dataset, we set the dimension of common latent space to 1,024. Since VQA dataset is larger than COCO-QA, we double the dimension of common latent space to adapt the data and classes. |