Stochastic No-regret Learning for General Games with Variance Reduction
Authors: Yichi Zhou, Fang Kong, Shuai Li
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We show that a stochastic version of optimistic mirror descent (OMD), a variant of mirror descent with recency bias, converges fast in general games. More specifically, with our algorithm, the individual regret of each player vanishes at a speed of O(1/T 3/4) and the sum of all players regret vanishes at a speed of O(1/T), which is an improvement upon the O(1/T) convergence rate of prior stochastic algorithms, where T is the number of interaction rounds. Due to the advantage of stochastic methods in the computational cost, we significantly improve the time complexity over the deterministic algorithms to approximate coarse correlated equilibrium. To achieve lower time complexity, we equip the stochastic version of OMD in (AM21) with a novel low-variance Monte Carlo estimator. Our algorithm extends previous works (AM21; CJST19) from two-player zero-sum games to general games.The rest of the paper is organized as follows: In Section 4, we introduce our algorithm and present a general regret upper bound in Theorem 1. In Section 5, we introduce our low-variance Monte-Carlo estimator and analyze its variance in Lemma 3. In Section 6, by combining the results in Theorem 1 and Lemma 3, we present our final regret bounds in Theorem 2 and 3, as well as the time complexity to approximate coarse correlated equilibrium in Corollary 1 and 2. |
| Researcher Affiliation | Collaboration | Yichi Zhou Microsoft Research, AI4Science, Asia Beijing, China yiczho@microsoft.com Fang Kong & Shuai Li Shanghai Jiao Tong University Shanghai, China {fangkong,shuaili8}@sjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1 Optimistic mirror descent with variance reduction 1: Input: hyper-parameters p, τ and α; 2: Initialize: w1 i (a) = 1/A and σ1 i (a) = 1/A 3: for k = 1, 2, , T do 4: Sample u from uniform distribution over [0, 1]. 5: for i = 1, , N do 6: Compute ˆσk i such that h(ˆσk i ) = α h(σk i ) + (1 α) h(wk i ). 7: Sample ak i LVE(i, σk, wk 1). {See section 5 for the definition of LVE.} 8: Compute σk+1 i such that h( σk+1 i ) = h(ˆσk i ) τ(Fi(wk i) + Fi,ak i(σk i) Fi,ak i(wk 1 i )). 9: Compute σk+1 i = arg minσi (Ai) D(σi, σk+1 i ). 10: Update wk+1 i = σk+1 i if u < p and wk+1 i = wk i if u p. 11: end for 12: end for |
| Open Source Code | No | The paper does not contain any statement about making the source code open, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper is theoretical and does not involve the use of datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not describe any validation data or splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware specifications used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not provide details about an experimental setup or hyperparameters. |