reproducibilityindex.ai

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

Authors: Ziang Song, Song Mei, Yu Bai

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper presents the first sample-efficient algorithm for learning the EFCE from bandit feedback. We begin by proposing K-EFCE a generalized definition that allows players to observe and deviate from the recommended actions for K times. ... We then design an uncoupled no-regret algorithm that finds an ε-approximate K-EFCE within e O(maxi Xi AK i /ε2) iterations in the full feedback setting... Finally, we design a sample-based variant of our algorithm that learns an ε-approximate K-EFCE within e O(maxi Xi AK+1 i /ε2) episodes of play in the bandit feedback setting. Our algorithms perform wide-range regret minimization over each infoset to minimize the overall K-EFCE regret, and introduce new efficient sampling policies to handle bandit feedback.
Researcher Affiliation	Collaboration	Ziang Song Stanford University ziangs@stanford.edu Song Mei UC Berkeley songmei@berkeley.edu Yu Bai Salesforce Research yu.bai@salesforce.com
Pseudocode	Yes	Algorithm 1 Executing modified policy ϕ πi Input: K-EFCE strategy modification ϕ ΦK i (0 K ), policy πi Πi for the ith player.
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper describes a theoretical framework for learning in extensive-form games and discusses 'episodes of play' for bandit feedback, implying data generation through interaction rather than the use of a predefined, publicly available training dataset with access information.
Dataset Splits	No	The paper focuses on theoretical algorithm design and complexity analysis, and as such, does not describe specific training, validation, or test dataset splits typically used in empirical studies.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and analysis; it does not provide any specific details about hardware used for experiments.
Software Dependencies	No	The paper is theoretical and describes algorithms; it does not specify any ancillary software dependencies with version numbers needed for replication.
Experiment Setup	No	The paper describes theoretical algorithms and their complexity, but it does not provide specific details about an experimental setup, such as hyperparameters or system-level training settings.