Communicating via Markov Decision Processes

Authors: Samuel Sokota, Christian A Schroeder De Witt, Maximilian Igl, Luisa M Zintgraf, Philip Torr, Martin Strohmeier, Zico Kolter, Shimon Whiteson, Jakob Foerster

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs.
Researcher Affiliation Collaboration 1Carnegie Mellon University 2Oxford University 3Waymo Research 4armasuisse Science + Technology 5Bosch Center for AI
Pseudocode Yes Algorithm 1 MEME (Sender)Algorithm 2 MEME (Receiver)Algorithm 3 Factored MEMEAlgorithm 4 RL+PR baselineAlgorithm 5 Min Entropy Joint DistributionAlgorithm 6 Lemma3-Sparse
Open Source Code Yes Our codebase is available at https://github.com/ schroederdewitt/meme.
Open Datasets Yes To demonstrate the efficacy of MEME, we present experiments for MCGs based on a gridworld, Cartpole, and Pong (Bellemare et al., 2013)... We used 200k training episodes Code Grid and 2M training episodes for Code Pong.
Dataset Splits No The paper mentions training episodes and evaluating results, but it does not explicitly describe train/validation/test dataset splits or cross-validation methodology.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions general computational aspects like 'trained our models' and 'neural networks'.
Software Dependencies Yes For Code Grid and Code Pong, layer weights are randomly initialized using Py Torch 1.7 (Paszke et al., 2017) defaults.
Experiment Setup Yes For Code Grid, we use a policy parameterized by neural network with two fully-connected layers of hidden dimension 64, each followed by a Re Lu activation... For Code Pong and Code Cart, we use a convolutional encoder with three layers of convolutions (number of channels, kernel size, stride) as follows: (32,8,4), (64,4,2), (64,3,1)... For all environments, we used the Adam optimizer with learning rate 10 4, β1 = 0.9, β2 = 0.999, ϵ = 10 8 and no weight decay.