Iterative Empirical Game Solving via Single Policy Best Response
Authors: Max Smith, Thomas Anthony, Michael Wellman
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that these algorithms substantially reduce the amount of simulation during training required by PSRO, while producing equivalent or better solutions to the game. |
| Researcher Affiliation | Collaboration | Max Olan Smith University of Michigan mxsmith@umich.edu Thomas Anthony Deepmind twa@google.com Michael P. Wellman University of Michigan wellman@umich.edu |
| Pseudocode | Yes | Algorithm 2: Mixed-Oracles |
| Open Source Code | No | The paper mentions using 'the DeepMind RL library for Agents. This library is open-source (github.com/deepmind/acme)', but does not provide a link or explicit statement about releasing the source code for the methodology described in this paper. |
| Open Datasets | Yes | We evaluate our algorithms on the Gathering (Perolat et al., 2017) and Leduc Poker (Southey et al., 2005) games, both of which are commonly used in the multiagent reinforcement learning field. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages or sample counts) needed to reproduce data partitioning. It describes evaluation strategies and hyperparameter selection. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'DeepMind RL library for Agents' and 'Acme' and algorithms like 'Double Q-Learning', 'IMPALA', 'DQN', 'MPO', and 'Adam optimizer', but does not provide specific version numbers for any of these components. |
| Experiment Setup | Yes | 300 hyperparameter settings are sampled in each environment. Complete details are provided in Section D. |