Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"

Authors: Saeed Amizadeh, Hamid Palangi, Alex Polozov, Yichen Huang, Kazuhito Koishida

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we experimentally demonstrate how we can incorporate our framework for evaluating the visual and the reasoning aspects of the VQA in a decoupled manner. To this end, we have performed experiments using our framework and candidate VQA models on the GQA dataset.
Researcher Affiliation Industry 1Microsoft Applied Sciences Group (ASG), Redmond WA, USA 2Microsoft Research AI, Redmond WA, USA. Correspondence to: Saeed Amizadeh <saamizad@microsoft.com>.
Pseudocode Yes Algorithm 1 Question answering in DFOL. Input: Question FQ (binary or open), threshold θ if FQ is a binary question then return α(FQ) > θ else Let {a1, . . . , ak} be the plausible answers for FQ return argmax1 i k α(FQ,ai)
Open Source Code Yes 1The Py Torch code for the -FOL framework is publicly available at https://github.com/microsoft/DFOL-VQA.
Open Datasets Yes To this end, we use the GQA dataset (Hudson & Manning, 2019b) of multi-step functional visual questions.
Dataset Splits Yes The GQA dataset consists of 22M questions defined over 130K real-life images. Each image in the Train/Validation splits is accompanied by the scene graph annotation, and each question in the Train/Validation/Test-Dev splits comes with its equivalent program.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models, or cloud computing specifications.
Software Dependencies No The paper mentions 'Py Torch code' and 'Adam optimizer' but does not specify their version numbers or versions for other software dependencies needed to replicate the experiments.
Experiment Setup Yes Training setup: For training all of -FOL models, we have used Adam optimizer with learning rate 10 4 and weight decay 10 10. The dropout ratio is set to 0.1. We have also applied gradient clipping with norm 0.65.