Near-Optimal Learning of Extensive-Form Games with Imperfect Information

Authors: Yu Bai, Chi Jin, Song Mei, Tiancheng Yu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only e O((XA + Y B)/ε2) episodes of play to find an ε-approximate Nash equilibrium in two-player zero-sum games... We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating balanced exploration policies into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multiplayer general-sum games.
Researcher Affiliation Collaboration Yu Bai * 1 Chi Jin * 2 Song Mei * 3 Tiancheng Yu * 4 1Salesforce Research 2Princeton University 3UC Berkeley 4MIT.
Pseudocode Yes Algorithm 1 Balanced OMD (max-player), Algorithm 2 Balanced CFR (max-player), Algorithm 3 Regret Minimization with Hedge (HEDGE), Algorithm 4 Regret Minimization with Regret Matching (REGRETMATCHING), Algorithm 5 Implementation of Balanced OMD update, Algorithm 6 CFR with Hedge (full feedback setting).
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and focuses on algorithm design and sample complexity analysis. It does not conduct experiments on a dataset, so there is no mention of a training dataset or its accessibility.
Dataset Splits No The paper is theoretical and does not report on experiments or data splits for reproduction.
Hardware Specification No The paper is theoretical and does not describe experiments, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe an experimental setup with specific software dependencies and version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup, hyperparameters, or training configurations.