reproducibilityindex.ai

Bridging semantics and pragmatics in information-theoretic emergent communication

Authors: Eleonora Gualdoni, Mycal Tucker, Roger Levy, Noga Zaslavsky

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test this approach in a rich visual domain of naturalistic images, and find that key human-like properties of the lexicon emerge when agents are guided by both context-specific utility and general communicative pressures, suggesting that both aspects are crucial for understanding how language may evolve in humans and in artificial agents. and We train and test our agents on the Many Names dataset [18].
Researcher Affiliation	Collaboration	Eleonora Gualdoni Apple MLR Universitat Pompeu Fabra e_gualdoni@apple.com Mycal Tucker MIT mycal@mit.edu Roger P. Levy MIT rplevy@mit.edu Noga Zaslavsky NYU nogaz@nyu.edu
Pseudocode	No	The paper describes the model's architecture and training objective verbally and with diagrams (Figure 2), but does not include any explicit pseudocode blocks or algorithms.
Open Source Code	Yes	Our code is available at https://github.com/InfoCogLab/info-sem-prag-neurips2024.
Open Datasets	Yes	We train and test our agents on the Many Names dataset [18], which contains 25K naturalistic images (see Figure 1 for example), each with one target object, appearing in a bounding box, annotated with 36 names provided by English native speakers asked to freely produce a name (one word) to describe the object.2 The name produced by the majority of the annotators for a target object is called the topname. footnote 2: Creative Commons Attribution 4.0 International License. [18] Carina Silberer, Sina Zarrieß, and Gemma Boleda. Object naming in language and vision: A survey and a new dataset. In Proceedings of LREC, pages 5792 5801, Marseille, France, 2020. European Language Resources Association.
Dataset Splits	No	In all our experiments, we trained agents on 70% of the images (randomly sampled) in self-play, with no human supervision, batch size of 128, and codebook initialized with 3000 trainable communication vectors (see Appendix A for further details). The paper mentions "test-time expected utility on unseen images" but does not explicitly detail a validation split for hyperparameter tuning.
Hardware Specification	Yes	Experiments were run on a cluster with 12 nodes with 5 NVIDIA A30 GPUs and 48 CPUs each.
Software Dependencies	No	The paper mentions using specific models like ResNet18, ResNet-101, Faster R-CNN, and VQ-VIB, but it does not specify the version numbers of software libraries or frameworks (e.g., PyTorch, TensorFlow, Python versions) used for implementation.
Experiment Setup	Yes	We trained 270 agent pairs with a range of combinations of λU, λC and λI. We used parameter annealing which... We trained agents on 70% of the images (randomly sampled) in self-play, with no human supervision, batch size of 128, and codebook initialized with 3000 trainable communication vectors (see Appendix A for further details). Appendix A: We started by training our agents with λU = 1 until convergence, i.e. for 10K epochs. After that, we first annealed models by keeping λC fixed at 0, while gradually decreasing the value of λU and increasing the value of λI, until reaching λI = 1. Then, for each trained value of λU, we gradually annealed λC. For each annealing step, we trained until reaching variance in the training objective lower than 0.0001 for the latest 1K epochs (as a criterion for convergence). We followed a non-uniform annealing schedule... Agents were trained with batch size 128, hyperparameter β of the VQ architecture set at 1. See Tucker et al. [38] for further details about the architecture and hyperparameters.