reproducibilityindex.ai

Emergent Discrete Communication in Semantic Spaces

Authors: Mycal Tucker, Huao Li, Siddharth Agrawal, Dana Hughes, Katia Sycara, Michael Lewis, Julie A Shah

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show in a decision theoretic framework that our technique optimizes communication over a wide range of scenarios, whereas one-hot tokens are only optimal under restrictive assumptions. In self-play experiments, we validate that our trained agents learn to cluster tokens in semantically-meaningful ways, allowing them communicate in noisy environments where other techniques fail. Lastly, we demonstrate both that agents using our method can effectively respond to novel human communication and that humans can understand unlabeled emergent agent communication, outperforming the use of one-hot communication.
Researcher Affiliation	Academia	Mycal Tucker1, Huao Li2, Siddharth Agrawal3, Dana Hughes3, Katia Sycara3, Michael Lewis2, and Julie Shah1 1 Massachusetts Institute of Technology 2 University of Pittsburgh 3 Carnegie Mellon University
Pseudocode	No	The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Anonymized code available at https://anonymous.4open.science/r/Neur IPS-protocomms
Open Datasets	Yes	In our environment, a speaker observed an image, drawn from the CIFAR10 dataset, and emitted a communication vector [16]. ... [16] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research).
Dataset Splits	Yes	Details on hyperparameter sweeps and train/test splits are provided in Appendix B and C.
Hardware Specification	Yes	All experiments were performed on a computing cluster with NVIDIA GPUs (1080Ti, 2080Ti, 3090, V100, and A100 models).
Software Dependencies	No	The paper mentions methods like MADDPG and the gumbel softmax trick, but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch versions).
Experiment Setup	Yes	All techniques were trained using the multi-agent deep deterministic policy gradient (MADDPG) method proposed by Lowe et al. [27], a common policy-gradient method that updates neural network weights to maximize the expected discounted reward. ... Details on hyperparameter sweeps and train/test splits are provided in Appendix B and C.