Learning Cross-Modal Context Graph for Visual Grounding
Authors: Yongfei Liu, Bo Wan, Xiaodan Zhu, Xuming He11645-11652
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efficacy of our grounding framework. |
| Researcher Affiliation | Academia | Yongfei Liu,1 Bo Wan,1 Xiaodan Zhu,2 Xuming He1 1Shanghai Tech University 2Queen s University {liuyf3, wanbo, hexm}@shanghaitech.edu.cn xiaodan.zhu@queensu.ca |
| Pseudocode | No | The paper describes the method using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/youngfly11/LCMCG-Py Torch. |
| Open Datasets | Yes | We evaluate our approach on Flickr30K Entities (Plummer et al. 2015) dataset, which contains 32k images, 275k bounding boxes, and 360k noun phrases. |
| Dataset Splits | Yes | We adopt the standard dataset split as in Plummer et al. (2015), which separates the dataset into 30k images for training, 1k for validation and 1k for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions using a pre-trained Res Net-101 network. |
| Software Dependencies | No | The paper mentions 'Py Torch' in the code link and 'SGD optimizer' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The embedding dimension D of phrase and visual representation is set as 1024. In visual graph construction, we select the most K = 10 relevant object candidates for each noun phrase. For model training, we use SGD optimizer with initial learning rate 5e-2, weight decay 1e-4 and momentum 0.9. We train 60k iterations with batch-size 24 totally and decay the learning rate 10 times in 20k and 40k iterations respectively. The loss weights of regression terms λ1 and λ4 are set to 0.1 while matching terms λ2 and λ3 are set to 1. During the test stage, we search an optimal weight β [0, 1] on val set and apply it to test set directly. |