Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Authors: Shicong Cen, Yuting Wei, Yuejie Chi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1: Performance illustration of the PU and OMWU methods for solving entropy-regularized matrix games with |A| = |B| = 100, where the entries of the payoff matrix A is generated independently from the uniform distribution on [ 1, 1]. The learning rates are fixed as η = 0.1. The left panel plots various error metrics of convergence w.r.t. the iteration count with τ = 0.01, while the right panel plots these error metrics at 1000-th iteration with different choices of τ.
Researcher Affiliation Academia Shicong Cen Carnegie Mellon University shicongc@andrew.cmu.edu Yuting Wei University of Pennsylvania ytwei@wharton.upenn.edu Yuejie Chi Carnegie Mellon University yuejiechi@cmu.edu
Pseudocode Yes Algorithm 1: The PU method; Algorithm 2: The OMWU method; Algorithm 3: Policy Extragradient Method for Entropy-regularized Markov Game
Open Source Code No The paper does not provide any specific links to open-source code or explicit statements about code availability for the described methodology.
Open Datasets No The paper uses synthetic data generated internally for its performance illustration (Figure 1), stating 'where the entries of the payoff matrix A is generated independently from the uniform distribution on [ 1, 1]'. It does not use or provide access information for a publicly available dataset.
Dataset Splits No The paper's performance illustration uses synthetically generated data and does not specify training, validation, or test splits. The focus is on theoretical convergence rates demonstrated with this generated data.
Hardware Specification No The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes The learning rates are fixed as η = 0.1.