Learning the Dynamics of Visual Relational Reasoning via Reinforced Path Routing

Authors: Chenchen Jing, Yunde Jia, Yuwei Wu, Chuanhao Li, Qi Wu1122-1130

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on referring expression comprehension and visual question answering demonstrate the effectiveness of our method.
Researcher Affiliation Academia 1Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, China 2Australian Centre for Robotic Vision, University of Adelaide, Australia
Pseudocode No The paper describes the methodology and model architecture but does not include structured pseudocode or algorithm blocks that are clearly labeled or formatted as such.
Open Source Code No The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We use two REC datasets: the CLEVR-Ref+ (Liu et al. 2019b) that is a synthetic diagnostic dataset, and the Ref-reasoning (Yang, Li, and Yu 2020) that contains real images. [...] The challenging GQA dataset (Hudson and Manning 2019a) that contains compositional questions about real-world images is used.
Dataset Splits Yes There are a train split and a val split in the CLEVRRef+ dataset. [...] The GQA dataset (Hudson and Manning 2019a) ... has a train split for training, a test-dev split for validation, and a test split for online testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software tools like the Spacy tool and various models/detectors, but it does not provide specific version numbers for these or other ancillary software components required for replication.
Experiment Setup Yes For the Ref-reasoning, the hyper-parameters µ, λ and γ are set as 0.01, 0.5, and 0.01. For the CLEVR-Ref+, the three hyperparameters are set as 0.01, 0.5, and 0.001. The max number of time steps is set as 4 for the Ref-reasoning and 3 for the CLEVR-Ref+. For both datasets, the dimensions of the spatial feature db and the common space d are set as 128, and 512, respectively.