Modeling Conceptual Understanding in Image Reference Games

Authors: Rodolfo Corona Rodriguez, Stephan Alaniz, Zeynep Akata

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on three benchmark image/attribute datasets suggest that our learner indeed encodes information directly pertaining to the understanding of other agents, and that leveraging this information is crucial for maximizing gameplay performance.
Researcher Affiliation Academia Rodolfo Corona UC Berkeley rcorona@berkeley.edu Stephan Alaniz Max Planck Institute for Informatics salaniz@mpi-inf.mpg.de Zeynep Akata University of Tübingen zeynep.akata@uni-tuebingen.de
Pseudocode No The paper describes algorithms in prose and mathematical formulas, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code with full specifications for experiments may be found at: https://github.com/rcorona/conceptual_img_ref
Open Datasets Yes We use the Aw A2 [Xian et al., 2018], SUN Attribute [Patterson et al., 2014], and CUB [Wah et al., 2011] datasets.
Dataset Splits Yes We use the standard splits for CUB and SUN, but make our own split for Aw A2 in order to have all classes represented in both train and test. The training splits are used for learning speaker parameters; we present performance on the test splits, using the same splits for each seed.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions models like ResNet-152 and PNASNet-5, and pre-training on ImageNet, but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes Unless stated otherwise, the listener population consists of 25 clusters, each with 100 listeners. For all curves we plot the average over 3 random seeds, with error curves representing one standard deviation. We estimate the value of using each attribute to describe the target image, i.e. V (sk, ak) : Rd A R using episodes from both the practice and evaluation phases optimizing the following loss: LV = 1 N + M N+M MSE(V (sk, ak), rk) (2)