Near-Optimal Learning of Extensive-Form Games with Imperfect Information
Authors: Yu Bai, Chi Jin, Song Mei, Tiancheng Yu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only e O((XA + Y B)/ε2) episodes of play to find an ε-approximate Nash equilibrium in two-player zero-sum games... We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating balanced exploration policies into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multiplayer general-sum games. |
| Researcher Affiliation | Collaboration | Yu Bai * 1 Chi Jin * 2 Song Mei * 3 Tiancheng Yu * 4 1Salesforce Research 2Princeton University 3UC Berkeley 4MIT. |
| Pseudocode | Yes | Algorithm 1 Balanced OMD (max-player), Algorithm 2 Balanced CFR (max-player), Algorithm 3 Regret Minimization with Hedge (HEDGE), Algorithm 4 Regret Minimization with Regret Matching (REGRETMATCHING), Algorithm 5 Implementation of Balanced OMD update, Algorithm 6 CFR with Hedge (full feedback setting). |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and sample complexity analysis. It does not conduct experiments on a dataset, so there is no mention of a training dataset or its accessibility. |
| Dataset Splits | No | The paper is theoretical and does not report on experiments or data splits for reproduction. |
| Hardware Specification | No | The paper is theoretical and does not describe experiments, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe an experimental setup with specific software dependencies and version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup, hyperparameters, or training configurations. |