Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games

Authors: Gabriele Farina, Robin Schmucker, Tuomas Sandholm5372-5380

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experiments in this setting. To our knowledge, this is the first implementation of bandit optimization for TFSDM. [...] We implemented our bandit regret minimizer (Algorithm 2) and the algorithm of Abernethy, Hazan, and Rakhlin (2008) (using a logarithmic barrier) (from now on, denoted AHR). We compared them on four domains: a simple 2 3 matrix game [...] and three standard extensive-form games [...] We ran each algorithm 100 times. Figure 2 shows the regret of the algorithms compared to always playing the best-response strategy against s.
Researcher Affiliation Collaboration 1Computer Science Department, Carnegie Mellon University, 2Machine Learning Department, Carnegie Mellon University 3Strategic Machine, Inc., 4Strategy Robot, Inc., 5Optimized Markets, Inc.
Pseudocode Yes In Appendix B in the full version of this paper we give pseudocode for GRADIENT and ARGCONJUGATE. [...] Algorithm 1: Full-information regret minimizer R. [...] Algorithm 2: Bandit regret minimizer R. [...] Algorithm 3: LOSSESTIMATE.
Open Source Code No The paper states 'We implemented our bandit regret minimizer (Algorithm 2)' but does not provide an explicit statement about the release of this code or a link to a repository.
Open Datasets Yes We compared them on four domains: a simple 2 3 matrix game [...] and three standard extensive-form games in the computational game theory literature, namely Kuhn poker (Kuhn 1950), 3-rank Goofspiel(Ross 1971), and Leduc poker (Southey et al. 2005).
Dataset Splits No The paper describes the game environments used for evaluation but does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies Yes We use the Eigen 3.3.3 library to compute the eigendecomposition at each time t.
Experiment Setup Yes For our method, we use the theoretical step size multiplied by 5, while for the method of Abernethy, Hazan, and Rakhlin (2008) we multiply their step size parameter by 2; these changes do not affect the theoretical guarantees but improved the practical performances of both algorithms.