Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Communicating via Markov Decision Processes

Authors: Samuel Sokota, Christian A Schroeder De Witt, Maximilian Igl, Luisa M Zintgraf, Philip Torr, Martin Strohmeier, Zico Kolter, Shimon Whiteson, Jakob Foerster

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Oxford University 3Waymo Research 4armasuisse Science + Technology 5Bosch Center for AI
Pseudocode	Yes	Algorithm 1 MEME (Sender)Algorithm 2 MEME (Receiver)Algorithm 3 Factored MEMEAlgorithm 4 RL+PR baselineAlgorithm 5 Min Entropy Joint DistributionAlgorithm 6 Lemma3-Sparse
Open Source Code	Yes	Our codebase is available at https://github.com/ schroederdewitt/meme.
Open Datasets	Yes	To demonstrate the efficacy of MEME, we present experiments for MCGs based on a gridworld, Cartpole, and Pong (Bellemare et al., 2013)... We used 200k training episodes Code Grid and 2M training episodes for Code Pong.
Dataset Splits	No	The paper mentions training episodes and evaluating results, but it does not explicitly describe train/validation/test dataset splits or cross-validation methodology.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions general computational aspects like 'trained our models' and 'neural networks'.
Software Dependencies	Yes	For Code Grid and Code Pong, layer weights are randomly initialized using Py Torch 1.7 (Paszke et al., 2017) defaults.
Experiment Setup	Yes	For Code Grid, we use a policy parameterized by neural network with two fully-connected layers of hidden dimension 64, each followed by a Re Lu activation... For Code Pong and Code Cart, we use a convolutional encoder with three layers of convolutions (number of channels, kernel size, stride) as follows: (32,8,4), (64,4,2), (64,3,1)... For all environments, we used the Adam optimizer with learning rate 10 4, β1 = 0.9, β2 = 0.999, ϵ = 10 8 and no weight decay.