Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

Authors: Ziang Song, Song Mei, Yu Bai

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper presents the first sample-efficient algorithm for learning the EFCE from bandit feedback. We begin by proposing K-EFCE a generalized definition that allows players to observe and deviate from the recommended actions for K times. ... We then design an uncoupled no-regret algorithm that finds an ε-approximate K-EFCE within e O(maxi Xi AK i /ε2) iterations in the full feedback setting... Finally, we design a sample-based variant of our algorithm that learns an ε-approximate K-EFCE within e O(maxi Xi AK+1 i /ε2) episodes of play in the bandit feedback setting. Our algorithms perform wide-range regret minimization over each infoset to minimize the overall K-EFCE regret, and introduce new efficient sampling policies to handle bandit feedback.
Researcher Affiliation Collaboration Ziang Song Stanford University ziangs@stanford.edu Song Mei UC Berkeley songmei@berkeley.edu Yu Bai Salesforce Research yu.bai@salesforce.com
Pseudocode Yes Algorithm 1 Executing modified policy ϕ πi Input: K-EFCE strategy modification ϕ ΦK i (0 K ), policy πi Πi for the ith player.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes a theoretical framework for learning in extensive-form games and discusses 'episodes of play' for bandit feedback, implying data generation through interaction rather than the use of a predefined, publicly available training dataset with access information.
Dataset Splits No The paper focuses on theoretical algorithm design and complexity analysis, and as such, does not describe specific training, validation, or test dataset splits typically used in empirical studies.
Hardware Specification No The paper is theoretical and focuses on algorithm design and analysis; it does not provide any specific details about hardware used for experiments.
Software Dependencies No The paper is theoretical and describes algorithms; it does not specify any ancillary software dependencies with version numbers needed for replication.
Experiment Setup No The paper describes theoretical algorithms and their complexity, but it does not provide specific details about an experimental setup, such as hyperparameters or system-level training settings.