Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Authors: Somak Aditya, Yezhou Yang, Chitta Baral

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental analysis of the answers and the key evidential predicates generated on the VQA dataset validate our approach.
Researcher Affiliation Academia Somak Aditya, Yezhou Yang, Chitta Baral School of Computing, Informatics and Decision Systems Engineering Arizona State University {saditya1,yz.yang,chitta}@asu.edu
Pseudocode No The paper describes the model architecture and logical formulations but does not include any explicit pseudocode blocks or algorithm sections.
Open Source Code No The paper states: 'We intend to make the details about the engine publicly available for further research.' and 'We will make our final answers together with ranked key evidence predicates publicly available for further research.' These are promises for future release of details or data, not current availability of the code. The link 'visionandreasoning.wordpress.com' is for examples, not code.
Open Datasets Yes MSCOCO-VQA (Antol et al. 2015) is the largest VQA dataset that contains both multiple choices and open-ended questions about arbitrary images collected from the Internet.
Dataset Splits Yes Specifically, we use 82, 783 images for training and 40, 504 validation images for testing.
Hardware Specification No The paper does not explicitly specify the hardware used for running the experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies No The paper mentions software components used (e.g., 'pre-trained Dense Captioning system', 'Stanford Dependency parsing', 'word2vec') but does not provide specific version numbers for these or other relevant software libraries/frameworks.
Experiment Setup No The paper describes the system components and overall approach but does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) or specific training configurations.