Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Minimizing Disappointment in Repeated Games

Authors: J. W. Crandall

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Specifically, we study the ability of expert algorithms to quickly learn effective strategies in repeated games, towards the ultimate goal of learning near-optimal behavior against any arbitrary associate within only a handful of interactions. Our contribution is three-fold. First, we advocate a new metric, called disappointment, for evaluating expert algorithms in repeated games. ... We first evaluate the effectiveness of several existing expert algorithms to achieve low disappointment when paired with various (1) non-adapting, (2) reinforcement learning, and (3) expert algorithms in two-player games. Finally, we describe a new meta-algorithm for enhancing expert algorithms. We show that it substantially reduces the disappointment of these algorithms against these same associates in both short- and long-term interactions.
Researcher Affiliation Academia Jacob W. Crandall EMAIL Masdar Institute of Science and Technology Abu Dhabi, United Arab Emirates
Pseudocode Yes Algorithm 1 A meta-algorithm (for agent i) to enhance expert algorithms. Input: A (the expert algorithm), Φi (the set of experts), and M (the payoff matrix) Initialize: t = 1 Compute zt i(φ) for each φ Φi Initialize αt i = maxφ Φi zt i(φ) repeat Compute Φ i(t) = {φ Φi : zt i(φ) minτ [ζ,t] ατ i }, where 0 ζ < t Execute (and update) A(Φ i(t)) for τ rounds, where τ is specified by A. Observer rt+τ i . t = t + τ Update αt i = λταt τ i + (1 λτ) rt i Compute zt i(φ) for each φ Φi until Game Over
Open Source Code No The paper does not contain any explicit statements or links regarding the public availability of source code for the described methodology.
Open Datasets No The paper evaluates algorithms in well-studied games from the literature, such as Prisoners Dilemma, Chicken, Stag hunt, etc., and also uses randomly generated games. These are theoretical game setups with defined payoff matrices (Tables 1 and 2), not external datasets with specific access information.
Dataset Splits No The paper evaluates algorithms in repeated games with defined payoff matrices (Tables 1 and 2) and in 50 randomly generated 3-action games. The concept of training/test/validation splits for a static dataset is not directly applicable, as the experiments involve learning in interactive game environments over a specified number of rounds (e.g., 1000, 50,000 rounds) rather than splitting a pre-collected dataset.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No Appendix A describes the algorithms used (e.g., Exp3, UCB1, EEE, S, BR1, Q-learning) and their parameters, but does not provide specific version numbers for underlying software dependencies like programming languages or libraries.
Experiment Setup Yes Table 5 in Appendix A states the parameters and settings used by each algorithm used in the paper. For example, for Q-learning: 'Q-values are initialized to the highest possible value 1/(1 − γ). ε = 1/(10 + t/10), γ = 0.95, α = 1/t'. For Exp3: 'η = γ/|Φi|, γ = sqrt(|Φi|ln(|Φi|)/(e-1)T0, where T0 is the expected number of rounds in the game.' The paper also mentions that 'Results are an average of 50 trials' (Figure 3 caption) and 'each algorithm was paired with itself and the other seven algorithms in 50 random 3-action repeated games. Each game consisted of 50,000 rounds.'