Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games
Authors: Gabriele Farina, Robin Schmucker, Tuomas Sandholm5372-5380
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experiments in this setting. To our knowledge, this is the first implementation of bandit optimization for TFSDM. [...] We implemented our bandit regret minimizer (Algorithm 2) and the algorithm of Abernethy, Hazan, and Rakhlin (2008) (using a logarithmic barrier) (from now on, denoted AHR). We compared them on four domains: a simple 2 3 matrix game [...] and three standard extensive-form games [...] We ran each algorithm 100 times. Figure 2 shows the regret of the algorithms compared to always playing the best-response strategy against s. |
| Researcher Affiliation | Collaboration | 1Computer Science Department, Carnegie Mellon University, 2Machine Learning Department, Carnegie Mellon University 3Strategic Machine, Inc., 4Strategy Robot, Inc., 5Optimized Markets, Inc. |
| Pseudocode | Yes | In Appendix B in the full version of this paper we give pseudocode for GRADIENT and ARGCONJUGATE. [...] Algorithm 1: Full-information regret minimizer R. [...] Algorithm 2: Bandit regret minimizer R. [...] Algorithm 3: LOSSESTIMATE. |
| Open Source Code | No | The paper states 'We implemented our bandit regret minimizer (Algorithm 2)' but does not provide an explicit statement about the release of this code or a link to a repository. |
| Open Datasets | Yes | We compared them on four domains: a simple 2 3 matrix game [...] and three standard extensive-form games in the computational game theory literature, namely Kuhn poker (Kuhn 1950), 3-rank Goofspiel(Ross 1971), and Leduc poker (Southey et al. 2005). |
| Dataset Splits | No | The paper describes the game environments used for evaluation but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments. |
| Software Dependencies | Yes | We use the Eigen 3.3.3 library to compute the eigendecomposition at each time t. |
| Experiment Setup | Yes | For our method, we use the theoretical step size multiplied by 5, while for the method of Abernethy, Hazan, and Rakhlin (2008) we multiply their step size parameter by 2; these changes do not affect the theoretical guarantees but improved the practical performances of both algorithms. |