Solving Common-Payoff Games with Approximate Policy Iteration

Authors: Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot9695-9703

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.
Researcher Affiliation Collaboration 1Universtiy of Alberta 2Deep Mind 3Mila, Universit e de Montr eal sokota@ualberta.ca, locked@google.com, finbarrtimbers@google.com, elnazd@google.com, ryan.dorazio@mila.quebec, burchn@google.com, mschmid@google.com, bowlingm@google.com, lanctot@google.com
Pseudocode Yes We provide pseudocode for CAPI in Algorithm 1.
Open Source Code Yes The code used to generate the results for CAPI is available at https://github.com/ssokota/capi.
Open Datasets Yes We consider two common-payoff games from Open Spiel (Lanctot, Lockhart et al. 2019) to demonstrate the efficacy of CAPI. [...] Code for the Tiny Hanabi Suite is available at https://github.com/ssokota/tiny-hanabi.
Dataset Splits No The paper mentions running experiments for a certain number of episodes and tuning hyperparameters, but it does not provide specific details on dataset splits (e.g., percentages or counts) for training, validation, or testing within those games.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cluster specifications) used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup No The paper mentions that algorithms were "tuned across nine hyperparameter settings" and that "Implementation details can be found in the appendix." However, the main text does not provide concrete hyperparameter values or detailed system-level training settings.