Modeling Conceptual Understanding in Image Reference Games
Authors: Rodolfo Corona Rodriguez, Stephan Alaniz, Zeynep Akata
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on three benchmark image/attribute datasets suggest that our learner indeed encodes information directly pertaining to the understanding of other agents, and that leveraging this information is crucial for maximizing gameplay performance. |
| Researcher Affiliation | Academia | Rodolfo Corona UC Berkeley rcorona@berkeley.edu Stephan Alaniz Max Planck Institute for Informatics salaniz@mpi-inf.mpg.de Zeynep Akata University of Tübingen zeynep.akata@uni-tuebingen.de |
| Pseudocode | No | The paper describes algorithms in prose and mathematical formulas, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code with full specifications for experiments may be found at: https://github.com/rcorona/conceptual_img_ref |
| Open Datasets | Yes | We use the Aw A2 [Xian et al., 2018], SUN Attribute [Patterson et al., 2014], and CUB [Wah et al., 2011] datasets. |
| Dataset Splits | Yes | We use the standard splits for CUB and SUN, but make our own split for Aw A2 in order to have all classes represented in both train and test. The training splits are used for learning speaker parameters; we present performance on the test splits, using the same splits for each seed. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions models like ResNet-152 and PNASNet-5, and pre-training on ImageNet, but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Unless stated otherwise, the listener population consists of 25 clusters, each with 100 listeners. For all curves we plot the average over 3 random seeds, with error curves representing one standard deviation. We estimate the value of using each attribute to describe the target image, i.e. V (sk, ak) : Rd A R using episodes from both the practice and evaluation phases optimizing the following loss: LV = 1 N + M N+M MSE(V (sk, ak), rk) (2) |