Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Guarantees for Self-Play in Multiplayer Games via Polymatrix Decomposability

Authors: Revan MacQueen, James Wright

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our findings through experiments on Leduc poker.
Researcher Affiliation Academia Revan Mac Queen Department of Computing Science University of Alberta / Amii; James R. Wright Department of Computing Science University of Alberta / Amii
Pseudocode Yes Algorithm 1 SGDecompose; Algorithm 2 Compute γ; Algorithm 3 SGDecompose with behavior strategies; Algorithm 4 get BRs
Open Source Code Yes The codebase for our experiments is available at https://github.com/Revan Mac Queen/ Self-Play-Polymatrix.
Open Datasets Yes Leduc poker was originally developed for two players but was extended to a 3-player variant by Abou Risk & Szafron (2010); we use the 3-player variant here.
Dataset Splits No The paper does not specify explicit train/validation/test dataset splits in the context of supervised learning, as it focuses on learning through self-play in games.
Hardware Specification No The paper mentions 'Computation for this work was provided by the Digital Research Alliance of Canada' but does not specify exact hardware details such as GPU/CPU models or memory amounts.
Software Dependencies No The paper mentions software components like 'CFR+' and 'Open Spiel implementation' but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We used the same parameters for each run of SGDecompose: λ = 0.5, B = 30, T = 200. We used a learning rate schedule where the learning rate η begins at 2 −6, then halves every 5 epochs until reaching 2 −17 to encourage convergence. We randomly initialize CFR+ with regrets between 0 and 0.001 chosen uniformly at random, which are the default values in Open Spiel.