reproducibilityindex.ai

Solving Common-Payoff Games with Approximate Policy Iteration

Authors: Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot9695-9703

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.
Researcher Affiliation	Collaboration	1Universtiy of Alberta 2Deep Mind 3Mila, Universit e de Montr eal sokota@ualberta.ca, locked@google.com, ﬁnbarrtimbers@google.com, elnazd@google.com, ryan.dorazio@mila.quebec, burchn@google.com, mschmid@google.com, bowlingm@google.com, lanctot@google.com
Pseudocode	Yes	We provide pseudocode for CAPI in Algorithm 1.
Open Source Code	Yes	The code used to generate the results for CAPI is available at https://github.com/ssokota/capi.
Open Datasets	Yes	We consider two common-payoff games from Open Spiel (Lanctot, Lockhart et al. 2019) to demonstrate the efﬁcacy of CAPI. [...] Code for the Tiny Hanabi Suite is available at https://github.com/ssokota/tiny-hanabi.
Dataset Splits	No	The paper mentions running experiments for a certain number of episodes and tuning hyperparameters, but it does not provide specific details on dataset splits (e.g., percentages or counts) for training, validation, or testing within those games.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cluster specifications) used to run the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	No	The paper mentions that algorithms were "tuned across nine hyperparameter settings" and that "Implementation details can be found in the appendix." However, the main text does not provide concrete hyperparameter values or detailed system-level training settings.