Deconfounded Visual Grounding

Authors: Jianqiang Huang, Yu Qin, Jiaxin Qi, Qianru Sun, Hanwang Zhang998-1006

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On popular benchmarks, RED improves various state-of-the-art grounding methods by a significant margin.
Researcher Affiliation Collaboration 1Nanyang Technological University, Singapore 2Damo Academy, Alibaba Group 3Singapore Management University
Pseudocode Yes Algorithm 1: Visual Grounding with RED
Open Source Code Yes Code is available at: https://github.com/JianqiangH/DeconfoundedVG.
Open Datasets Yes Ref COCO, Ref COCO+ and Ref COCOg are three visual grounding benchmarks and their images are from MS-COCO (Lin et al. 2014).
Dataset Splits Yes Ref COCO (Yu et al. 2016) has ... is split into train/ validation/ test A/ test B with 120,624/ 10,834/ 5,657/ 5,095 images, respectively.
Hardware Specification Yes Under fair settings, we test the speed of Yang s-V1 and Yang s-V1+RED on a single Tesla V100.
Software Dependencies No The paper mentions using BERT and K-Means but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We deployed the K-Means algorithm to cluster those into N = 10 clusters forming the confounder dictionary Dg in Eq. (7)." and "After N exceeding 10, the performance won t show further improvement, thus we set N = 10.