reproducibilityindex.ai

Using Syntax to Ground Referring Expressions in Natural Images

Authors: Volkan Cirik, Taylor Berg-Kirkpatrick, Louis-Philippe Morency

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that Ground Net achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects. Using these additional annotations, our empirical evaluations demonstrate that Gound Net substantially outperforms the state-of-the-art at intermediate predictions of the supporting objects, yet maintains comparable accuracy at target object localization.
Researcher Affiliation	Academia	Volkan Cirik, Taylor Berg-Kirkpatrick, Louis-Philippe Morency School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 {vcirik,tberg,morency}@cs.cmu.edu
Pseudocode	Yes	Algorithm 1: Generate Computation Graph
Open Source Code	Yes	Our annotations for supporting objects and implementations are available for public use1. 1https://github.com/volkancirik/groundnet
Open Datasets	Yes	We use the standard Google-Ref (Mao et al. 2016) benchmark for our experiments. We additionally present a new set of annotations on Google-Ref dataset. Our annotations for supporting objects and implementations are available for public use1.
Dataset Splits	Yes	best validation split which is 2,5% of training data separated from training split.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) were mentioned for running experiments.
Software Dependencies	No	The paper mentions software components like GloVe, Faster-RCNN, VGG-16 network, Stanford Parser, LSTM, and Xavier initialization, but no specific version numbers are provided for any of these dependencies.
Experiment Setup	Yes	We trained Ground Net with backpropagation. We used stochastic gradient descent for 6 epochs with and initial learning rate of 0.01 and multiplied by 0.4 after each epoch. Hidden layer size of LSTM networks was searched over the range of {64,128,...,1024} and picked based on best validation split which is 2,5% of training data separated from training split. We initialized all parameters of the model with Xavier initialization (Glorot and Bengio 2010) and used weight decay rate of 0.0005 as regularization.