reproducibilityindex.ai

Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games

Authors: Gabriele Farina, Robin Schmucker, Tuomas Sandholm5372-5380

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide experiments in this setting. To our knowledge, this is the ﬁrst implementation of bandit optimization for TFSDM. [...] We implemented our bandit regret minimizer (Algorithm 2) and the algorithm of Abernethy, Hazan, and Rakhlin (2008) (using a logarithmic barrier) (from now on, denoted AHR). We compared them on four domains: a simple 2 3 matrix game [...] and three standard extensive-form games [...] We ran each algorithm 100 times. Figure 2 shows the regret of the algorithms compared to always playing the best-response strategy against s.
Researcher Affiliation	Collaboration	1Computer Science Department, Carnegie Mellon University, 2Machine Learning Department, Carnegie Mellon University 3Strategic Machine, Inc., 4Strategy Robot, Inc., 5Optimized Markets, Inc.
Pseudocode	Yes	In Appendix B in the full version of this paper we give pseudocode for GRADIENT and ARGCONJUGATE. [...] Algorithm 1: Full-information regret minimizer R. [...] Algorithm 2: Bandit regret minimizer R. [...] Algorithm 3: LOSSESTIMATE.
Open Source Code	No	The paper states 'We implemented our bandit regret minimizer (Algorithm 2)' but does not provide an explicit statement about the release of this code or a link to a repository.
Open Datasets	Yes	We compared them on four domains: a simple 2 3 matrix game [...] and three standard extensive-form games in the computational game theory literature, namely Kuhn poker (Kuhn 1950), 3-rank Goofspiel(Ross 1971), and Leduc poker (Southey et al. 2005).
Dataset Splits	No	The paper describes the game environments used for evaluation but does not specify any training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	Yes	We use the Eigen 3.3.3 library to compute the eigendecomposition at each time t.
Experiment Setup	Yes	For our method, we use the theoretical step size multiplied by 5, while for the method of Abernethy, Hazan, and Rakhlin (2008) we multiply their step size parameter by 2; these changes do not affect the theoretical guarantees but improved the practical performances of both algorithms.