Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering
Authors: Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results on the CLEVR and SHAPES datasets verify our hypotheses, showing that the model gets better program (and answer) prediction accuracy even in the low data regime, and allows one to probe the coherence and consistency of reasoning performed. |
| Researcher Affiliation | Collaboration | 1Facebook AI Research 2Georgia Tech. |
| Pseudocode | Yes | Algorithm 1 Prob-NMN Training Procedure |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We report our results on the CLEVR (Johnson et al., 2017) dataset and the SHAPES datasets (Andreas et al., 2016). |
| Dataset Splits | Yes | The CLEVR dataset has been extensively used as a benchmark for testing reasoning in VQA models in various prior works (Hu et al., 2017; 2018; Hudson & Manning, 2018; Johnson et al., 2017; Perez et al., 2018; Santoro et al., 2017) and is composed of 70,000 images and around 700K questions, answers and functional programs in the training set, and 15,000 images and 150K questions in the validation set. We choose first 20K examples from CLEVR v1.0 validation set and use it as our val set. ... We use train, val, and test splits of 13,568, 1,024, and 1,024 (x, z, i, a) triplets respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using LSTM neural networks but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We repeat this process across five random datasets and report mean and variance at a given level of supervision. ... We run question coding across 5 different runs, pick the best performing model, and then run module training (updating θZ) across 10 different runs. Next, we run the best model from this stage for joint training (sweeping across values of γ {1, 10, 100}). ... Table 1. Results (in percentage) on the SHAPES dataset with varying amounts of question-program supervision (%x z), β = 0.1. |