Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning

Authors: Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we show that our proposed approach empirically compares favourably to other MARL baselines, validate the importance of specific components via ablation studies and illustrate how our method can act as a measure of channel capacity to learn where best to communicate. ... As part of our contribution, we designed a configurable environment with discrete state and action space the phone booth maze to evaluate agents in the CTD/CTU setting
Researcher Affiliation Collaboration Yat Long Lo University of Oxford Dyson Robot Learning Lab richie.lo@dyson.com Christian Schroeder de Witt FLAIR, University of Oxford cs@robots.ox.ac.uk Samuel Sokota Carnegie Mellon University ssokota@andrew.cmu.edu Jakob Foerster FLAIR, University of Oxford jakob.foerster@eng.ox.ac.uk Shimon Whiteson University of Oxford shimon.whiteson@cs.cs.ox.ac.uk
Pseudocode Yes Algorithm 1 outlines the pseudocode of the proposed architecture to tackle the cheap talk discovery and utilization problem. ... Algorithm 1 Pseudocode for our proposed method
Open Source Code No The paper states, "We also release a novel benchmark suite to stimulate future research in CTD/CTU." in the abstract. However, it does not explicitly state that the source code for the methodology (CTDL/CTDUL) itself is open-source or provide a link to its repository.
Open Datasets No The paper states, "As part of our contribution, we designed a configurable environment with discrete state and action space the phone booth maze to evaluate agents in the CTD/CTU setting." While they developed a new environment and benchmark suite, they do not provide specific access information (e.g., URL, DOI, or formal citation for a publicly available dataset) to a pre-collected dataset that could be downloaded for training.
Dataset Splits Yes We trained all methods for 12000 episodes (80000 episodes for CTU) and evaluated on test episodes every 20 episodes by taking the corresponding greedy policy. ... We performed a hyperparameter sweep over common hyperparameters, fixing them across all methods, and specific sweeps for method-specific parameters. Please see Appendix F and G for training and hyperparameter details.
Hardware Specification Yes Training was done in an internal cluster with a mix of GTX 1080 and RTX 2080 GPUs.
Software Dependencies No All neural network components are implemented using the neural network library Py Torch (Paszke et al., 2019)... The paper mentions PyTorch but does not specify its version number, nor does it list versions for other ancillary software or libraries used.
Experiment Setup Yes We trained all methods for 12000 episodes (80000 episodes for CTU) and evaluated on test episodes every 20 episodes by taking the corresponding greedy policy. Each algorithm reports averaged results from 4 random seeds with standard error. We performed a hyperparameter sweep over common hyperparameters, fixing them across all methods, and specific sweeps for method-specific parameters. Please see Appendix F and G for training and hyperparameter details. ... Table 2: Common parameters used across algorithms ... Table 3: Method-specific parameters ... Table 4: Table for environment configurations of each environment used in the experiments