Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Estimating α-Rank by Maximizing Information Gain

Authors: Tabish Rashid, Cheng Zhang, Kamil Ciosek5673-5681

AAAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7 Experiments In this section, we describe our results on synthetic games, graphing the Bayesian regret JB t described in Section 6. We also justify the use of Bayesian regret, showing that it is highly coupled with the ground truth payoff. We benchmark two versions of our algorithms, αIG (Bins) and αIG (NSB), which differ in the employed entropy estimator. We compare to three baselines: RG-UCB, a frequentist bandit algorithm (Rowland et al. 2019), Payoff, which maximizes the information gain about the payoff distribution, and Uniform, which selects payoffs uniformly at random.
Researcher Affiliation Collaboration Tabish Rashid*,1 Cheng Zhang,2 Kamil Ciosek2 1University of Oxford 2Microsoft Research, Cambridge, UK EMAIL, EMAIL
Pseudocode Yes Algorithm 1 αIG algorithm. αIG(NSB) and αIG(Bin) variants differ in entropy estimator (Line 7).
Open Source Code Yes Code is available at github.com/microsoft/Info Gainalpharank.
Open Datasets No To investigate our algorithm, we study two environments whose payoffs are shown in Figure 3. We start with the relatively simple environment with 4 agents. Figure 3 (Left) shows the expected payoffs, which we can interpret as the win-rate. Samples are drawn from a Bernoulli distribution with the appropriate mean. (The paper uses synthetic games with dynamically generated samples rather than a fixed public dataset with access information.)
Dataset Splits No The paper describes experiments on 'synthetic games' where 'Samples are drawn from a Bernoulli distribution'. It does not specify training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined splits, as data is generated dynamically.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No In our implementation, we use POT (Flamary and Courty 2017) to approximate this distance. (Refers to software but without specific version numbers for POT or any other dependencies.)
Experiment Setup Yes A detailed explanation of the experimental setup2 and details on the used hyperparameters are included in Appendix.