reproducibilityindex.ai

Iterative Empirical Game Solving via Single Policy Best Response

Authors: Max Smith, Thomas Anthony, Michael Wellman

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that these algorithms substantially reduce the amount of simulation during training required by PSRO, while producing equivalent or better solutions to the game.
Researcher Affiliation	Collaboration	Max Olan Smith University of Michigan mxsmith@umich.edu Thomas Anthony Deepmind twa@google.com Michael P. Wellman University of Michigan wellman@umich.edu
Pseudocode	Yes	Algorithm 2: Mixed-Oracles
Open Source Code	No	The paper mentions using 'the DeepMind RL library for Agents. This library is open-source (github.com/deepmind/acme)', but does not provide a link or explicit statement about releasing the source code for the methodology described in this paper.
Open Datasets	Yes	We evaluate our algorithms on the Gathering (Perolat et al., 2017) and Leduc Poker (Southey et al., 2005) games, both of which are commonly used in the multiagent reinforcement learning field.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages or sample counts) needed to reproduce data partitioning. It describes evaluation strategies and hyperparameter selection.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software like 'DeepMind RL library for Agents' and 'Acme' and algorithms like 'Double Q-Learning', 'IMPALA', 'DQN', 'MPO', and 'Adam optimizer', but does not provide specific version numbers for any of these components.
Experiment Setup	Yes	300 hyperparameter settings are sampled in each environment. Complete details are provided in Section D.