Solving Common-Payoff Games with Approximate Policy Iteration
Authors: Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot9695-9703
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so. |
| Researcher Affiliation | Collaboration | 1Universtiy of Alberta 2Deep Mind 3Mila, Universit e de Montr eal sokota@ualberta.ca, locked@google.com, finbarrtimbers@google.com, elnazd@google.com, ryan.dorazio@mila.quebec, burchn@google.com, mschmid@google.com, bowlingm@google.com, lanctot@google.com |
| Pseudocode | Yes | We provide pseudocode for CAPI in Algorithm 1. |
| Open Source Code | Yes | The code used to generate the results for CAPI is available at https://github.com/ssokota/capi. |
| Open Datasets | Yes | We consider two common-payoff games from Open Spiel (Lanctot, Lockhart et al. 2019) to demonstrate the efficacy of CAPI. [...] Code for the Tiny Hanabi Suite is available at https://github.com/ssokota/tiny-hanabi. |
| Dataset Splits | No | The paper mentions running experiments for a certain number of episodes and tuning hyperparameters, but it does not provide specific details on dataset splits (e.g., percentages or counts) for training, validation, or testing within those games. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cluster specifications) used to run the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | No | The paper mentions that algorithms were "tuned across nine hyperparameter settings" and that "Implementation details can be found in the appendix." However, the main text does not provide concrete hyperparameter values or detailed system-level training settings. |