reproducibilityindex.ai

Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding

Authors: Zhou Yu, Jun Yu, Chenchao Xiang, Zhou Zhao, Qi Tian, Dacheng Tao

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we evaluate it on four benchmark datasets. Experimental results demonstrate that our model delivers significant improvements on all the tested data-sets
Researcher Affiliation	Collaboration	1 School of Computer Science and Technology, Hangzhou Dianzi University, P. R. China 2 College of Computer Science, Zhejiang University, P. R. China 3 Department of Computer Science, University of Texas at San Antonio, USA 4 UBTECH Sydney AI Centre, SIT, FEIT, University of Sydney, Australia
Pseudocode	No	The paper describes the proposed models and algorithms in detail, but it does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	We evaluate our approach on four benchmark datasets: Flickr30K Entities [Plummer et al., 2015], Refer It Game [Kazemzadeh et al., 2014], Ref COCO and Ref COCO+ [Yu et al., 2016]. These are all commonly used benchmark datasets for visual grounding.
Dataset Splits	Yes	Flickr30K Entities... We use the standard split in our setting, i.e., 1k images for validation, 1k for testing, and 30k for training. Refer It Game... We use the same data split as in [Rohrbach et al., 2016], namely 10k images for testing, 9k for training and 1k for validation. Ref COCO and Ref COCO+... The datasets are split into four sets: train, validation, test A and test B.
Hardware Specification	No	The paper mentions that DDPN models are trained for up to 30 epochs, which "takes two GPUs 2 3 weeks to finish." However, it does not specify the make, model, or any other details of these GPUs, nor any other hardware components like CPUs or memory.
Software Dependencies	No	The paper mentions software components and models like VGG-16, ResNet-101, LSTM, Adam solver, and the xavier method for initialization, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup	Yes	The loss weight γ is set to 1 for all experiments. The dimensionality of the visual feature dv is 2048 (for Res Net101) or 4096 (for VGG-16), the dimensionality of the word embedding feature de is 300, the dimensionality of the output feature of the LSTM network dq is 1024, the dimensionality of the fused feature do is 512, the threshold of Io U scores η is 0.5, and the number of proposals N is 100... We use the Adam solver to train the model with β1 = 0.9, β2 = 0.99. The base learning rate is set to 0.001 with an exponential decay rate of 0.1. The mini-batch size is set to 64. All the models are trained up to 10k iterations.