reproducibilityindex.ai

Interpretable Counting for Visual Question Answering

Authors: Alexander Trott, Caiming Xiong, Richard Socher

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, our method outperforms the state of the art architecture for VQA on multiple metrics that evaluate counting.
Researcher Affiliation	Industry	Alexander Trott, Caiming Xiong , & Richard Socher Salesforce Research Palo Alto, CA {atrott,cxiong,rsocher}@salesforce.com
Pseudocode	No	The paper describes the model and methods using mathematical equations and prose but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	To facilitate future comparison to our work, we have made the training, development, and test question IDs available for download.
Open Datasets	Yes	For training and evaluation, we create a new dataset, How Many-QA. It is taken from the counting-speciﬁc union of VQA 2.0 (Goyal et al., 2017) and Visual Genome QA (Krishna et al., 2016).
Dataset Splits	Yes	The original VQA 2.0 train set includes roughly 444K QA pairs, of which 57,606 are labeled as having a number answer. Focusing on counting questions results in a still very large dataset with 47,542 pairs...we divide the validation data into separate development and test sets. More speciﬁcally, we apply the above criteria to the ofﬁcial validation data and select 5,000 of the resulting QA pairs to serve as the test data. The remaining 17,714 QA pairs are used as the development set. (Table 1: Train 83,642, Dev. 17,714, Test 5,000)
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU model, CPU type, memory) used for running experiments.
Software Dependencies	No	The paper mentions software components like GloVe, LSTM, and Adam optimizer, but it does not specify their version numbers or other software dependencies with version details.
Experiment Setup	Yes	When training on counting, we optimize using Adam (Kingma & Ba, 2014). For Soft Count and Up Down, we use a learning rate of 3x10 4 and decay the learning rate by 0.8 when the training accuracy plateaus. For IRLC, we use a learning rate of 5x10 4 and decay the learning rate by 0.99999 every iteration. For all models, we regularize using dropout and apply early stopping based on the development set accuracy (see below). ... We weight the entropy penalty PH and interaction penalty PI (Eq. 10) both by 0.005 relative to the counting loss.