reproducibilityindex.ai

Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games

Authors: Gabriele Farina, Christian Kroer, Tuomas Sandholm1917-1925

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our framework leads to algorithms that scale at a rate comparable to the fastest variants of counterfactual regret minimization for computing Nash equilibrium, and therefore our approach leads to the ﬁrst algorithm for computing quantal response equilibria in extremely large games.
Researcher Affiliation	Academia	Gabriele Farina, Christian Kroer, Tuomas Sandholm Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 {gfarina,ckroer,sandholm}@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 gives pseudocode for a sample implementation of a regret minimizer based on LRD. The algorithm assumes that a regret minimizer Rj has been chosen for each decision point j J and has been initialized via a call to Rj.INITIALIZE() before the algorithm starts. A concrete example of regret local regret minimizer Rj based on online gradient descent (OGD) is given later, together with its implementation, in Algorithm 2.
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We conducted experiments on two EFG settings. The ﬁrst game is Leduc 5 poker (Southey et al. 2005), a standard benchmark in imperfect-information game solving... The second game is a variant of Goofspiel (Ross 1971), a bidding game where each player has a hand of cards numbered 1 to N.
Dataset Splits	No	The paper describes experiments in game environments (Leduc 5, Goofspiel) which do not typically involve explicit train/validation/test dataset splits as in supervised machine learning. Therefore, no such splits are provided.
Hardware Specification	No	The paper mentions 'Our code is parallel and the experiments were conducted on a machine with 8 cores, but the results are not about timing, so they would be identical even on a single core.' This provides the number of cores but lacks specific CPU models or other detailed hardware specifications.
Software Dependencies	No	The paper mentions algorithms like 'online gradient descent (OGD)' and 'Newton s method' but does not provide specific version numbers for any software libraries, frameworks, or compilers used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We used alternation (Figure 3) only in the implementation of CFR+; all other algorithms use the ﬂow of Figure 2. We stopped the training of the exploiter after 5000 iterations or when an average regret of 0.0005 was reached, whichever happened ﬁrst. For simplicity, we computed a logit QRE with parameter λ = 1. In other words, at each decision point, we added to the loss a term α z 2, where z is the strategy at that decision point.