Interpretable Counting for Visual Question Answering
Authors: Alexander Trott, Caiming Xiong, Richard Socher
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, our method outperforms the state of the art architecture for VQA on multiple metrics that evaluate counting. |
| Researcher Affiliation | Industry | Alexander Trott, Caiming Xiong , & Richard Socher Salesforce Research Palo Alto, CA {atrott,cxiong,rsocher}@salesforce.com |
| Pseudocode | No | The paper describes the model and methods using mathematical equations and prose but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | To facilitate future comparison to our work, we have made the training, development, and test question IDs available for download. |
| Open Datasets | Yes | For training and evaluation, we create a new dataset, How Many-QA. It is taken from the counting-specific union of VQA 2.0 (Goyal et al., 2017) and Visual Genome QA (Krishna et al., 2016). |
| Dataset Splits | Yes | The original VQA 2.0 train set includes roughly 444K QA pairs, of which 57,606 are labeled as having a number answer. Focusing on counting questions results in a still very large dataset with 47,542 pairs...we divide the validation data into separate development and test sets. More specifically, we apply the above criteria to the official validation data and select 5,000 of the resulting QA pairs to serve as the test data. The remaining 17,714 QA pairs are used as the development set. (Table 1: Train 83,642, Dev. 17,714, Test 5,000) |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU model, CPU type, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions software components like GloVe, LSTM, and Adam optimizer, but it does not specify their version numbers or other software dependencies with version details. |
| Experiment Setup | Yes | When training on counting, we optimize using Adam (Kingma & Ba, 2014). For Soft Count and Up Down, we use a learning rate of 3x10 4 and decay the learning rate by 0.8 when the training accuracy plateaus. For IRLC, we use a learning rate of 5x10 4 and decay the learning rate by 0.99999 every iteration. For all models, we regularize using dropout and apply early stopping based on the development set accuracy (see below). ... We weight the entropy penalty PH and interaction penalty PI (Eq. 10) both by 0.005 relative to the counting loss. |