reproducibilityindex.ai

Estimating α-Rank by Maximizing Information Gain

Authors: Tabish Rashid, Cheng Zhang, Kamil Ciosek5673-5681

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7 Experiments In this section, we describe our results on synthetic games, graphing the Bayesian regret JB t described in Section 6. We also justify the use of Bayesian regret, showing that it is highly coupled with the ground truth payoff. We benchmark two versions of our algorithms, αIG (Bins) and αIG (NSB), which differ in the employed entropy estimator. We compare to three baselines: RG-UCB, a frequentist bandit algorithm (Rowland et al. 2019), Payoff, which maximizes the information gain about the payoff distribution, and Uniform, which selects payoffs uniformly at random.
Researcher Affiliation	Collaboration	Tabish Rashid*,1 Cheng Zhang,2 Kamil Ciosek2 1University of Oxford 2Microsoft Research, Cambridge, UK tabish.rashid@cs.ox.ac.uk, {cheng.zhang, kamil.ciosek}@microsoft.com
Pseudocode	Yes	Algorithm 1 αIG algorithm. αIG(NSB) and αIG(Bin) variants differ in entropy estimator (Line 7).
Open Source Code	Yes	Code is available at github.com/microsoft/Info Gainalpharank.
Open Datasets	No	To investigate our algorithm, we study two environments whose payoffs are shown in Figure 3. We start with the relatively simple environment with 4 agents. Figure 3 (Left) shows the expected payoffs, which we can interpret as the win-rate. Samples are drawn from a Bernoulli distribution with the appropriate mean. (The paper uses synthetic games with dynamically generated samples rather than a fixed public dataset with access information.)
Dataset Splits	No	The paper describes experiments on 'synthetic games' where 'Samples are drawn from a Bernoulli distribution'. It does not specify training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined splits, as data is generated dynamically.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	In our implementation, we use POT (Flamary and Courty 2017) to approximate this distance. (Refers to software but without specific version numbers for POT or any other dependencies.)
Experiment Setup	Yes	A detailed explanation of the experimental setup2 and details on the used hyperparameters are included in Appendix.