Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
Authors: Zhou Yu, Jun Yu, Chenchao Xiang, Zhou Zhao, Qi Tian, Dacheng Tao
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we evaluate it on four benchmark datasets. Experimental results demonstrate that our model delivers significant improvements on all the tested data-sets |
| Researcher Affiliation | Collaboration | 1 School of Computer Science and Technology, Hangzhou Dianzi University, P. R. China 2 College of Computer Science, Zhejiang University, P. R. China 3 Department of Computer Science, University of Texas at San Antonio, USA 4 UBTECH Sydney AI Centre, SIT, FEIT, University of Sydney, Australia |
| Pseudocode | No | The paper describes the proposed models and algorithms in detail, but it does not include any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We evaluate our approach on four benchmark datasets: Flickr30K Entities [Plummer et al., 2015], Refer It Game [Kazemzadeh et al., 2014], Ref COCO and Ref COCO+ [Yu et al., 2016]. These are all commonly used benchmark datasets for visual grounding. |
| Dataset Splits | Yes | Flickr30K Entities... We use the standard split in our setting, i.e., 1k images for validation, 1k for testing, and 30k for training. Refer It Game... We use the same data split as in [Rohrbach et al., 2016], namely 10k images for testing, 9k for training and 1k for validation. Ref COCO and Ref COCO+... The datasets are split into four sets: train, validation, test A and test B. |
| Hardware Specification | No | The paper mentions that DDPN models are trained for up to 30 epochs, which "takes two GPUs 2 3 weeks to finish." However, it does not specify the make, model, or any other details of these GPUs, nor any other hardware components like CPUs or memory. |
| Software Dependencies | No | The paper mentions software components and models like VGG-16, ResNet-101, LSTM, Adam solver, and the xavier method for initialization, but it does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The loss weight γ is set to 1 for all experiments. The dimensionality of the visual feature dv is 2048 (for Res Net101) or 4096 (for VGG-16), the dimensionality of the word embedding feature de is 300, the dimensionality of the output feature of the LSTM network dq is 1024, the dimensionality of the fused feature do is 512, the threshold of Io U scores η is 0.5, and the number of proposals N is 100... We use the Adam solver to train the model with β1 = 0.9, β2 = 0.99. The base learning rate is set to 0.001 with an exponential decay rate of 0.1. The mini-batch size is set to 64. All the models are trained up to 10k iterations. |