Polynomial-Time Linear-Swap Regret Minimization in Imperfect-Information Sequential Games

Authors: Gabriele Farina, Charilaos Pipis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation To further illustrate the separation between no-linear-swap-regret dynamics and no-trigger-regret dynamics, used for EFCE, we provide experimental evidence that minimizing linear-swap-regret also minimizes trigger-regret (Figure 2, left), while minimizing trigger-regret does not minimize linear-swap regret. Specifically, in Figure 2 we compare our no-linear-swap-regret learning dynamics (given in Algorithm 1) to the no-trigger-regret algorithm introduced by Farina et al. [2022].
Researcher Affiliation Academia Gabriele Farina MIT gfarina@mit.edu Charilaos Pipis MIT chpipis@mit.edu
Pseudocode Yes Algorithm 1: Φ-Regret minimizer for the set Φ = MQ Q
Open Source Code No The paper states 'We implemented our no-linear-swap-regret algorithm (Algorithm 1) in the C++ programming language using the Gurobi commercial optimization solver [Gurobi Optimization, LLC, 2023], version 10.' but does not mention making their implementation's source code openly available.
Open Datasets Yes Game instance used We ran our code on the standard benchmark game of Kuhn poker Kuhn [1950]. We used a three-player variant of the game.
Dataset Splits No The paper describes running experiments on a game instance (Kuhn poker) but does not provide explicit training, validation, or testing dataset splits, as would typically be done for machine learning models trained on static datasets. The context is a game simulation, not a dataset split.
Hardware Specification No Minimal computational resources were used. All code ran on a personal laptop for roughly 12 hours.
Software Dependencies Yes We implemented our no-linear-swap-regret algorithm (Algorithm 1) in the C++ programming language using the Gurobi commercial optimization solver [Gurobi Optimization, LLC, 2023], version 10.
Experiment Setup Yes We did very minimal tuning of the constant learning rate η used for online projected gradient descent, trying values η {0.05, 0.1, 0.5} (we remark that a constant value of η 1/T is theoretically sound). We found that η = 0.1, which is used in the plots of Figure 2, performed best.