Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Policy Gradient Methods Converge Globally in Imperfect-Information Extensive-Form Games

Authors: Fivos Kalogiannis, Gabriele Farina

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To corroborate our theoretical results, we tested Alt-Reg NPG on four different imperfect information EFGs (Kuhn Poker, Leduc Poker, 2 2 Abrupt Dark Hex and Liar s Dice). Inspired by MMD (Sokota et al., 2022), we implement two variants of Alt-Reg NPG where the (i) the regularization strength diminishes across time along the stepsizes and (ii) the regularizer is the discounted KL divergence from a moving reference policy. We observe that the exploitability (i.e. maxπ 1 V π 1,π2 minπ 2 V π1,π 2) diminishes across time for our method, and it compares well with CFR and MMD.
Researcher Affiliation Academia Fivos Kalogiannis UCSD CSE La Jolla, CA 92093 EMAIL Gabriele Farina MIT EECS Cambridge, MA 02139 EMAIL
Pseudocode No The parameter updates of alternating regularized policy gradient takes the following form, xt+1 = Proj X ε h xt ηx ˆ τ x(xt, yt) i yt+1 = Proj Yε h yt + ηy ˆ τ y(xt+1, yt) i . (Alt-Reg PG) ... χt+1 = Proj XR h χt ηx ˆ τ χ(χt, θt) i ; θt+1 = Proj ΘR h θt + ηy ˆ τ θ(χt+1, θt) i . (Alt-Ent Reg PG)
Open Source Code Yes 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: code and proofs are in the supplemental material
Open Datasets Yes To corroborate our theoretical results, we tested Alt-Reg NPG on four different imperfect information EFGs (Kuhn Poker, Leduc Poker, 2 2 Abrupt Dark Hex and Liar s Dice).
Dataset Splits No The paper does not specify any dataset splits for training, validation, or testing for the mentioned game environments (Kuhn Poker, Leduc Poker, Abrupt Dark Hex, Liar's Dice).
Hardware Specification Yes 8. Experiments compute resources Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: description of laptop
Software Dependencies No The paper does not explicitly list specific software components with version numbers in its main text.
Experiment Setup No The paper describes testing Alt-Reg NPG on four different imperfect information EFGs, implementing two variants where regularization strength diminishes or uses discounted KL divergence. However, it does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or detailed training configurations in the main text.