Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications
Authors: Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Somewhat surprisingly, we show in this paper that for a large family of games, there exists a simple black-box reduction that transforms the average iterates of an uncoupled learning dynamics into the last iterates of a new uncoupled learning dynamics, thus also providing a reduction from last-iterate convergence to average-iterate convergence. Our reduction applies to games where each player s utility is linear in both their own strategy and the joint strategy of all opponents. This family includes two-player bimatrix games and generalizations such as multi-player polymatrix games. By applying our reduction to the Optimistic Multiplicative Weights Update algorithm, we obtain new state-of-the-art last-iterate convergence rates for uncoupled learning dynamics in multi-player zero-sum polymatrix games: (1) an O( log d T ) last-iterate convergence rate under gradient feedback, representing an exponential improvement in the dependence on the dimension d (i.e., the maximum number of actions available to either player); and (2) an O(d 1 5 T 1 5 ) last-iterate convergence rate under bandit feedback, improving upon the previous best rates of O( 1 Introduction The convergence of online learning algorithms in games under self-play is a fundamental question in both game theory and machine learning. |
| Researcher Affiliation | Academia | Yang Cai Yale University EMAIL Haipeng Luo University of Southern California EMAIL Chen-Yu Wei University of Virginia EMAIL Weiqiang Zheng Yale University EMAIL |
| Pseudocode | Yes | Algorithm 1: A2L: Black-Box Reduction from Average to Last-Iterate and Algorithm 2: A2L-OMWU-Bandit: OMWU with bandit feedback and A2L reduction (for player i) |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on game theory and learning dynamics. It does not describe any experiments that would use publicly available datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets, thus no dataset split information is provided. |
| Hardware Specification | No | The paper is theoretical and does not involve empirical experiments, therefore no specific hardware details are provided. |
| Software Dependencies | No | The paper is theoretical and does not involve empirical experiments, therefore no specific software dependencies with version numbers are listed. |
| Experiment Setup | No | The paper is theoretical and does not involve empirical experiments, therefore no specific experimental setup details such as hyperparameter values or training configurations are provided. |