Emergent Discrete Communication in Semantic Spaces

Authors: Mycal Tucker, Huao Li, Siddharth Agrawal, Dana Hughes, Katia Sycara, Michael Lewis, Julie A Shah

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show in a decision theoretic framework that our technique optimizes communication over a wide range of scenarios, whereas one-hot tokens are only optimal under restrictive assumptions. In self-play experiments, we validate that our trained agents learn to cluster tokens in semantically-meaningful ways, allowing them communicate in noisy environments where other techniques fail. Lastly, we demonstrate both that agents using our method can effectively respond to novel human communication and that humans can understand unlabeled emergent agent communication, outperforming the use of one-hot communication.
Researcher Affiliation Academia Mycal Tucker1, Huao Li2, Siddharth Agrawal3, Dana Hughes3, Katia Sycara3, Michael Lewis2, and Julie Shah1 1 Massachusetts Institute of Technology 2 University of Pittsburgh 3 Carnegie Mellon University
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Anonymized code available at https://anonymous.4open.science/r/Neur IPS-protocomms
Open Datasets Yes In our environment, a speaker observed an image, drawn from the CIFAR10 dataset, and emitted a communication vector [16]. ... [16] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research).
Dataset Splits Yes Details on hyperparameter sweeps and train/test splits are provided in Appendix B and C.
Hardware Specification Yes All experiments were performed on a computing cluster with NVIDIA GPUs (1080Ti, 2080Ti, 3090, V100, and A100 models).
Software Dependencies No The paper mentions methods like MADDPG and the gumbel softmax trick, but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch versions).
Experiment Setup Yes All techniques were trained using the multi-agent deep deterministic policy gradient (MADDPG) method proposed by Lowe et al. [27], a common policy-gradient method that updates neural network weights to maximize the expected discounted reward. ... Details on hyperparameter sweeps and train/test splits are provided in Appendix B and C.