Communicating via Markov Decision Processes
Authors: Samuel Sokota, Christian A Schroeder De Witt, Maximilian Igl, Luisa M Zintgraf, Philip Torr, Martin Strohmeier, Zico Kolter, Shimon Whiteson, Jakob Foerster
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Oxford University 3Waymo Research 4armasuisse Science + Technology 5Bosch Center for AI |
| Pseudocode | Yes | Algorithm 1 MEME (Sender)Algorithm 2 MEME (Receiver)Algorithm 3 Factored MEMEAlgorithm 4 RL+PR baselineAlgorithm 5 Min Entropy Joint DistributionAlgorithm 6 Lemma3-Sparse |
| Open Source Code | Yes | Our codebase is available at https://github.com/ schroederdewitt/meme. |
| Open Datasets | Yes | To demonstrate the efficacy of MEME, we present experiments for MCGs based on a gridworld, Cartpole, and Pong (Bellemare et al., 2013)... We used 200k training episodes Code Grid and 2M training episodes for Code Pong. |
| Dataset Splits | No | The paper mentions training episodes and evaluating results, but it does not explicitly describe train/validation/test dataset splits or cross-validation methodology. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions general computational aspects like 'trained our models' and 'neural networks'. |
| Software Dependencies | Yes | For Code Grid and Code Pong, layer weights are randomly initialized using Py Torch 1.7 (Paszke et al., 2017) defaults. |
| Experiment Setup | Yes | For Code Grid, we use a policy parameterized by neural network with two fully-connected layers of hidden dimension 64, each followed by a Re Lu activation... For Code Pong and Code Cart, we use a convolutional encoder with three layers of convolutions (number of channels, kernel size, stride) as follows: (32,8,4), (64,4,2), (64,3,1)... For all environments, we used the Adam optimizer with learning rate 10 4, β1 = 0.9, β2 = 0.999, ϵ = 10 8 and no weight decay. |