Iterative Empirical Game Solving via Single Policy Best Response

Authors: Max Smith, Thomas Anthony, Michael Wellman

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that these algorithms substantially reduce the amount of simulation during training required by PSRO, while producing equivalent or better solutions to the game.
Researcher Affiliation Collaboration Max Olan Smith University of Michigan mxsmith@umich.edu Thomas Anthony Deepmind twa@google.com Michael P. Wellman University of Michigan wellman@umich.edu
Pseudocode Yes Algorithm 2: Mixed-Oracles
Open Source Code No The paper mentions using 'the DeepMind RL library for Agents. This library is open-source (github.com/deepmind/acme)', but does not provide a link or explicit statement about releasing the source code for the methodology described in this paper.
Open Datasets Yes We evaluate our algorithms on the Gathering (Perolat et al., 2017) and Leduc Poker (Southey et al., 2005) games, both of which are commonly used in the multiagent reinforcement learning field.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages or sample counts) needed to reproduce data partitioning. It describes evaluation strategies and hyperparameter selection.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software like 'DeepMind RL library for Agents' and 'Acme' and algorithms like 'Double Q-Learning', 'IMPALA', 'DQN', 'MPO', and 'Adam optimizer', but does not provide specific version numbers for any of these components.
Experiment Setup Yes 300 hyperparameter settings are sampled in each environment. Complete details are provided in Section D.