reproducibilityindex.ai

Learning Cross-Modal Context Graph for Visual Grounding

Authors: Yongfei Liu, Bo Wan, Xiaodan Zhu, Xuming He11645-11652

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efﬁcacy of our grounding framework.
Researcher Affiliation	Academia	Yongfei Liu,1 Bo Wan,1 Xiaodan Zhu,2 Xuming He1 1Shanghai Tech University 2Queen s University {liuyf3, wanbo, hexm}@shanghaitech.edu.cn xiaodan.zhu@queensu.ca
Pseudocode	No	The paper describes the method using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/youngﬂy11/LCMCG-Py Torch.
Open Datasets	Yes	We evaluate our approach on Flickr30K Entities (Plummer et al. 2015) dataset, which contains 32k images, 275k bounding boxes, and 360k noun phrases.
Dataset Splits	Yes	We adopt the standard dataset split as in Plummer et al. (2015), which separates the dataset into 30k images for training, 1k for validation and 1k for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions using a pre-trained Res Net-101 network.
Software Dependencies	No	The paper mentions 'Py Torch' in the code link and 'SGD optimizer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	The embedding dimension D of phrase and visual representation is set as 1024. In visual graph construction, we select the most K = 10 relevant object candidates for each noun phrase. For model training, we use SGD optimizer with initial learning rate 5e-2, weight decay 1e-4 and momentum 0.9. We train 60k iterations with batch-size 24 totally and decay the learning rate 10 times in 20k and 40k iterations respectively. The loss weights of regression terms λ1 and λ4 are set to 0.1 while matching terms λ2 and λ3 are set to 1. During the test stage, we search an optimal weight β [0, 1] on val set and apply it to test set directly.