Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Authors: Saeed Amizadeh, Hamid Palangi, Alex Polozov, Yichen Huang, Kazuhito Koishida
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we experimentally demonstrate how we can incorporate our framework for evaluating the visual and the reasoning aspects of the VQA in a decoupled manner. To this end, we have performed experiments using our framework and candidate VQA models on the GQA dataset. |
| Researcher Affiliation | Industry | 1Microsoft Applied Sciences Group (ASG), Redmond WA, USA 2Microsoft Research AI, Redmond WA, USA. Correspondence to: Saeed Amizadeh <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Question answering in DFOL. Input: Question FQ (binary or open), threshold θ if FQ is a binary question then return α(FQ) > θ else Let {a1, . . . , ak} be the plausible answers for FQ return argmax1 i k α(FQ,ai) |
| Open Source Code | Yes | 1The Py Torch code for the -FOL framework is publicly available at https://github.com/microsoft/DFOL-VQA. |
| Open Datasets | Yes | To this end, we use the GQA dataset (Hudson & Manning, 2019b) of multi-step functional visual questions. |
| Dataset Splits | Yes | The GQA dataset consists of 22M questions defined over 130K real-life images. Each image in the Train/Validation splits is accompanied by the scene graph annotation, and each question in the Train/Validation/Test-Dev splits comes with its equivalent program. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions 'Py Torch code' and 'Adam optimizer' but does not specify their version numbers or versions for other software dependencies needed to replicate the experiments. |
| Experiment Setup | Yes | Training setup: For training all of -FOL models, we have used Adam optimizer with learning rate 10 4 and weight decay 10 10. The dropout ratio is set to 0.1. We have also applied gradient clipping with norm 0.65. |