Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
$O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games
Authors: Yuepeng Yang, Cong Ma
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that optimistic-follow-the-regularized-leader (OFTRL), together with smooth value updates, finds an O(T 1)-approximate Nash equilibrium in T iterations for twoplayer zero-sum Markov games with full information. This improves the O(T 5/6) convergence rate recently shown in the paper by Zhang et al. (2022b). The refined analysis hinges on two essential ingredients. |
| Researcher Affiliation | Academia | Department of Statistics, University of Chicago; Email: EMAIL |
| Pseudocode | Yes | Algorithm 1 Optimistic-follow-the-regularized-leader for solving two-player zero-sum Markov games |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on datasets, so it does not specify any dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments on datasets, so it does not provide dataset split information. |
| Hardware Specification | No | The paper is theoretical and does not describe experiments that would require specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings. |