reproducibilityindex.ai

GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Authors: Haozhan Shen, Tiancheng Zhao, Mingwei Zhu, Jianwei Yin

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct main experiments on Ref COCO/+/g datasets for REC and Flickr30k Entities dataset for phrase grounding.
Researcher Affiliation	Academia	Haozhan Shen1, Tiancheng Zhao2*, Mingwei Zhu1, Jianwei Yin1 1Zhejiang University 2Binjiang Institute of Zhejiang University {hz shen, zhumw, zjuyjw}@zju.edu.cn, tianchez@zju-bj.com
Pseudocode	No	No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Our code is available at https://github.com/om-ai-lab/Ground VLP.
Open Datasets	Yes	We adopt three widely used datasets: Ref COCO, Ref COCO+ (Yu et al. 2016) and Ref COCOg (Mao et al. 2016). Ref COCO and Ref COCO+ are both split into validation, test A, and test B sets... We adopt Flickr30k entities dataset (Plummer et al. 2015) for the task...
Dataset Splits	Yes	Ref COCO and Ref COCO+ are both split into validation, test A, and test B sets, where test A generally contains queries with persons as referring targets and test B contains other types.
Hardware Specification	No	The paper does not provide specific details on the hardware used for experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions software tools and models like 'Stanza', 'CLIP', 'Detic', 'Vin VL', and 'ALBEF', but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For ALBEF, we use the 3rd layer of the cross-modality encoder for Grad CAM. For Vin VL, we use the 20th layer of the cross-modality encoder and select m = 7... For REC, we set α = 0.5, θ = 0.15 when using ground-truth category and θ = 0.3 for predicted category. For phrase grounding, we set α = 0.25 and θ = 0.15.