Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimism Without Regularization: Constant Regret in Zero-Sum Games
Authors: John Lazarsfeld, Georgios Piliouras, Ryann Sim, Stratis Skoulakis
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Apart from our theoretical results, we also experimentally evaluate Optimistic FP on higher-dimensional zero-sum games, and these evaluations suggest that, even for much larger games, Optimistic FP still obtains constant regret. |
| Researcher Affiliation | Academia | John Lazarsfeld SUTD Georgios Piliouras SUTD Ryann Sim SUTD Stratis Skoulakis Aarhus University |
| Pseudocode | Yes | Optimistic FP can be equivalently written with respect to the cumulative payoff vectors yt 1 = Pt 1 k=0 Axk 2 Rm and yt 2 = Pt 1 k=0 A xk 1 Rn. Specifically, the iterates of the algorithm can be expressed in the following primal-dual form: Definition 2.3. Let y0 1 = 0 Rm and y0 2 = 0 Rn, and fix any initial x0 1 m and x0 2 n. Then for t 1, the dual (i.e., (yt 1, yt 2)) and primal (i.e., (xt 1, xt 2)) iterates of Optimistic FP are: yt 1 = yt 1 1 + Axt 1 2 yt 2 = yt 1 2 A xt 1 1 and xt 1 = argmax x {ei}m x, yt 1 + Axt 1 2 xt 2 = argmax x {ei}n x, yt 2 A xt 1 1 . (OFP) |
| Open Source Code | Yes | First, we note that all code used to run experiments can be found in the supplementary material. |
| Open Datasets | No | Families of payoff matrices. Aside from the (Matching Pennies) game, our experimental evaluations of Fictitious Play variants are performed on three high-dimensional families of payoff matrices: Identity matrices: Here, the payoff matrix is the n n identity matrix In (i.e., the diagonal matrix with diagonal entries all 1). Generalized Rock-Paper-Scissors (RPS) matrices: Here, the payoff matrix is the n n generalization of the classic three-strategy Rock-Paper-Scissors game. Random [0,1] matrices: We also consider n n payoff matrices with uniformly random entries in [0, 1]. |
| Dataset Splits | No | The paper describes generating synthetic data based on specified matrix types (Identity, RPS, Random [0,1]) and performing simulations for a fixed number of iterations (T = 10000) from multiple random initializations. It does not refer to standard training, validation, or test dataset splits typically used with pre-existing datasets. |
| Hardware Specification | No | In this paper, all experiments were run locally on a single personal computer. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers). |
| Experiment Setup | Yes | The table shows the empirical regret of Optimistic FP and standard FP (using lexicographical tiebreaking) on three classes of zero-sum games, in three higher dimensional settings. For each setting, the algorithms were run from 100 random initializations, each for T = 10000 iterations, and we report the average regret over all initializations. |