Estimating α-Rank by Maximizing Information Gain

Authors: Tabish Rashid, Cheng Zhang, Kamil Ciosek5673-5681

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7 Experiments In this section, we describe our results on synthetic games, graphing the Bayesian regret JB t described in Section 6. We also justify the use of Bayesian regret, showing that it is highly coupled with the ground truth payoff. We benchmark two versions of our algorithms, αIG (Bins) and αIG (NSB), which differ in the employed entropy estimator. We compare to three baselines: RG-UCB, a frequentist bandit algorithm (Rowland et al. 2019), Payoff, which maximizes the information gain about the payoff distribution, and Uniform, which selects payoffs uniformly at random.
Researcher Affiliation Collaboration Tabish Rashid*,1 Cheng Zhang,2 Kamil Ciosek2 1University of Oxford 2Microsoft Research, Cambridge, UK tabish.rashid@cs.ox.ac.uk, {cheng.zhang, kamil.ciosek}@microsoft.com
Pseudocode Yes Algorithm 1 αIG algorithm. αIG(NSB) and αIG(Bin) variants differ in entropy estimator (Line 7).
Open Source Code Yes Code is available at github.com/microsoft/Info Gainalpharank.
Open Datasets No To investigate our algorithm, we study two environments whose payoffs are shown in Figure 3. We start with the relatively simple environment with 4 agents. Figure 3 (Left) shows the expected payoffs, which we can interpret as the win-rate. Samples are drawn from a Bernoulli distribution with the appropriate mean. (The paper uses synthetic games with dynamically generated samples rather than a fixed public dataset with access information.)
Dataset Splits No The paper describes experiments on 'synthetic games' where 'Samples are drawn from a Bernoulli distribution'. It does not specify training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined splits, as data is generated dynamically.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No In our implementation, we use POT (Flamary and Courty 2017) to approximate this distance. (Refers to software but without specific version numbers for POT or any other dependencies.)
Experiment Setup Yes A detailed explanation of the experimental setup2 and details on the used hyperparameters are included in Appendix.