Stochastic No-regret Learning for General Games with Variance Reduction

Authors: Yichi Zhou, Fang Kong, Shuai Li

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We show that a stochastic version of optimistic mirror descent (OMD), a variant of mirror descent with recency bias, converges fast in general games. More specifically, with our algorithm, the individual regret of each player vanishes at a speed of O(1/T 3/4) and the sum of all players regret vanishes at a speed of O(1/T), which is an improvement upon the O(1/T) convergence rate of prior stochastic algorithms, where T is the number of interaction rounds. Due to the advantage of stochastic methods in the computational cost, we significantly improve the time complexity over the deterministic algorithms to approximate coarse correlated equilibrium. To achieve lower time complexity, we equip the stochastic version of OMD in (AM21) with a novel low-variance Monte Carlo estimator. Our algorithm extends previous works (AM21; CJST19) from two-player zero-sum games to general games.The rest of the paper is organized as follows: In Section 4, we introduce our algorithm and present a general regret upper bound in Theorem 1. In Section 5, we introduce our low-variance Monte-Carlo estimator and analyze its variance in Lemma 3. In Section 6, by combining the results in Theorem 1 and Lemma 3, we present our final regret bounds in Theorem 2 and 3, as well as the time complexity to approximate coarse correlated equilibrium in Corollary 1 and 2.
Researcher Affiliation Collaboration Yichi Zhou Microsoft Research, AI4Science, Asia Beijing, China yiczho@microsoft.com Fang Kong & Shuai Li Shanghai Jiao Tong University Shanghai, China {fangkong,shuaili8}@sjtu.edu.cn
Pseudocode Yes Algorithm 1 Optimistic mirror descent with variance reduction 1: Input: hyper-parameters p, τ and α; 2: Initialize: w1 i (a) = 1/A and σ1 i (a) = 1/A 3: for k = 1, 2, , T do 4: Sample u from uniform distribution over [0, 1]. 5: for i = 1, , N do 6: Compute ˆσk i such that h(ˆσk i ) = α h(σk i ) + (1 α) h(wk i ). 7: Sample ak i LVE(i, σk, wk 1). {See section 5 for the definition of LVE.} 8: Compute σk+1 i such that h( σk+1 i ) = h(ˆσk i ) τ(Fi(wk i) + Fi,ak i(σk i) Fi,ak i(wk 1 i )). 9: Compute σk+1 i = arg minσi (Ai) D(σi, σk+1 i ). 10: Update wk+1 i = σk+1 i if u < p and wk+1 i = wk i if u p. 11: end for 12: end for
Open Source Code No The paper does not contain any statement about making the source code open, nor does it provide a link to a code repository.
Open Datasets No The paper is theoretical and does not involve the use of datasets for training.
Dataset Splits No The paper is theoretical and does not describe any validation data or splits.
Hardware Specification No The paper is theoretical and does not describe any hardware specifications used for experiments.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not provide details about an experimental setup or hyperparameters.