Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games
Authors: Ziang Song, Song Mei, Yu Bai
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper presents the first sample-efficient algorithm for learning the EFCE from bandit feedback. We begin by proposing K-EFCE a generalized definition that allows players to observe and deviate from the recommended actions for K times. ... We then design an uncoupled no-regret algorithm that finds an ε-approximate K-EFCE within e O(maxi Xi AK i /ε2) iterations in the full feedback setting... Finally, we design a sample-based variant of our algorithm that learns an ε-approximate K-EFCE within e O(maxi Xi AK+1 i /ε2) episodes of play in the bandit feedback setting. Our algorithms perform wide-range regret minimization over each infoset to minimize the overall K-EFCE regret, and introduce new efficient sampling policies to handle bandit feedback. |
| Researcher Affiliation | Collaboration | Ziang Song Stanford University ziangs@stanford.edu Song Mei UC Berkeley songmei@berkeley.edu Yu Bai Salesforce Research yu.bai@salesforce.com |
| Pseudocode | Yes | Algorithm 1 Executing modified policy ϕ πi Input: K-EFCE strategy modification ϕ ΦK i (0 K ), policy πi Πi for the ith player. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes a theoretical framework for learning in extensive-form games and discusses 'episodes of play' for bandit feedback, implying data generation through interaction rather than the use of a predefined, publicly available training dataset with access information. |
| Dataset Splits | No | The paper focuses on theoretical algorithm design and complexity analysis, and as such, does not describe specific training, validation, or test dataset splits typically used in empirical studies. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and analysis; it does not provide any specific details about hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and describes algorithms; it does not specify any ancillary software dependencies with version numbers needed for replication. |
| Experiment Setup | No | The paper describes theoretical algorithms and their complexity, but it does not provide specific details about an experimental setup, such as hyperparameters or system-level training settings. |