Inference-Based Deterministic Messaging For Multi-Agent Communication

Authors: Varun Bhatt, Michael Buro11228-11236

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we first study learning in matrix-based signaling games to empirically show that decentralized methods can converge to a suboptimal policy. We then propose a modification to the messaging policy, in which the sender deterministically chooses the best message that helps the receiver to infer the sender s observation. Using this modification, we see, empirically, that the agents converge to the optimal policy in nearly all the runs. We then apply this method to a partially observable gridworld environment which requires cooperation between two agents and show that, with appropriate approximation methods, the proposed sender modification can enhance existing decentralized training methods for more complex domains as well. [...] Figures 6 and 7 show the plots of the mean normalized reward, as a function of episodes, on 3 3 and 32 32 payoff matrices respectively. [...] Table 1 shows the results we obtained for methods presented in Eccles et al. (2019) and our inference-based method.
Researcher Affiliation Academia Varun Bhatt, Michael Buro Department of Computing Science, University of Alberta, Canada vbhatt@ualberta.ca, mburo@ualberta.ca
Pseudocode Yes Algorithm 1: Inference-Based Messaging (Signaling Game, Tabular Case) and Algorithm 2: Inference-Based Messaging (Gridworld, Approximation Case)
Open Source Code No The paper does not provide a direct link to a code repository, nor does it explicitly state that the code will be released or is available in supplementary materials.
Open Datasets No The paper mentions using 'the gridworld environment called Treasure Hunt introduced by Eccles et al. (2019)' but does not provide specific access information (link, DOI, repository name, or formal citation for data access) for this dataset. It cites the paper where it was introduced, but that does not guarantee data availability.
Dataset Splits No The paper discusses training and testing, but does not explicitly provide details about train/validation/test splits, such as percentages, sample counts, or references to predefined splits.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. It only mentions 'computational constraints'.
Software Dependencies No The paper mentions using specific software libraries like 'RLLib (Liang et al. 2018)' but does not provide version numbers for any of these components, which is required for reproducibility.
Experiment Setup No The paper mentions using specific RL algorithms like 'Q-Learning' and 'IMPALA' and concepts like 'LSTM' and 'Contrastive Predictive Coding' as part of the setup. It also mentions 'All Q-values of the sender were initialized pessimistically' and 'the receiver used Q-Learning with optimistically initialized Q-values' and 'higher step size is used for positive Q-value updates and lower step size for negative Q-value updates' for Hysteretic-Q, and 'only positive updates are made on Q-values during initial time steps' for Lenience. However, it does not provide specific numerical values for hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer settings for any of the algorithms mentioned (e.g., Q-Learning, REINFORCE, IMPALA, or the modifications).