Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Using Syntax to Ground Referring Expressions in Natural Images
Authors: Volkan Cirik, Taylor Berg-Kirkpatrick, Louis-Philippe Morency
AAAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Ground Net achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects. Using these additional annotations, our empirical evaluations demonstrate that Gound Net substantially outperforms the state-of-the-art at intermediate predictions of the supporting objects, yet maintains comparable accuracy at target object localization. |
| Researcher Affiliation | Academia | Volkan Cirik, Taylor Berg-Kirkpatrick, Louis-Philippe Morency School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 EMAIL |
| Pseudocode | Yes | Algorithm 1: Generate Computation Graph |
| Open Source Code | Yes | Our annotations for supporting objects and implementations are available for public use1. 1https://github.com/volkancirik/groundnet |
| Open Datasets | Yes | We use the standard Google-Ref (Mao et al. 2016) benchmark for our experiments. We additionally present a new set of annotations on Google-Ref dataset. Our annotations for supporting objects and implementations are available for public use1. |
| Dataset Splits | Yes | best validation split which is 2,5% of training data separated from training split. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) were mentioned for running experiments. |
| Software Dependencies | No | The paper mentions software components like GloVe, Faster-RCNN, VGG-16 network, Stanford Parser, LSTM, and Xavier initialization, but no specific version numbers are provided for any of these dependencies. |
| Experiment Setup | Yes | We trained Ground Net with backpropagation. We used stochastic gradient descent for 6 epochs with and initial learning rate of 0.01 and multiplied by 0.4 after each epoch. Hidden layer size of LSTM networks was searched over the range of {64,128,...,1024} and picked based on best validation split which is 2,5% of training data separated from training split. We initialized all parameters of the model with Xavier initialization (Glorot and Bengio 2010) and used weight decay rate of 0.0005 as regularization. |