Using Syntax to Ground Referring Expressions in Natural Images
Authors: Volkan Cirik, Taylor Berg-Kirkpatrick, Louis-Philippe Morency
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Ground Net achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects. Using these additional annotations, our empirical evaluations demonstrate that Gound Net substantially outperforms the state-of-the-art at intermediate predictions of the supporting objects, yet maintains comparable accuracy at target object localization. |
| Researcher Affiliation | Academia | Volkan Cirik, Taylor Berg-Kirkpatrick, Louis-Philippe Morency School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 {vcirik,tberg,morency}@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1: Generate Computation Graph |
| Open Source Code | Yes | Our annotations for supporting objects and implementations are available for public use1. 1https://github.com/volkancirik/groundnet |
| Open Datasets | Yes | We use the standard Google-Ref (Mao et al. 2016) benchmark for our experiments. We additionally present a new set of annotations on Google-Ref dataset. Our annotations for supporting objects and implementations are available for public use1. |
| Dataset Splits | Yes | best validation split which is 2,5% of training data separated from training split. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) were mentioned for running experiments. |
| Software Dependencies | No | The paper mentions software components like GloVe, Faster-RCNN, VGG-16 network, Stanford Parser, LSTM, and Xavier initialization, but no specific version numbers are provided for any of these dependencies. |
| Experiment Setup | Yes | We trained Ground Net with backpropagation. We used stochastic gradient descent for 6 epochs with and initial learning rate of 0.01 and multiplied by 0.4 after each epoch. Hidden layer size of LSTM networks was searched over the range of {64,128,...,1024} and picked based on best validation split which is 2,5% of training data separated from training split. We initialized all parameters of the model with Xavier initialization (Glorot and Bengio 2010) and used weight decay rate of 0.0005 as regularization. |