Estimating α-Rank by Maximizing Information Gain
Authors: Tabish Rashid, Cheng Zhang, Kamil Ciosek5673-5681
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7 Experiments In this section, we describe our results on synthetic games, graphing the Bayesian regret JB t described in Section 6. We also justify the use of Bayesian regret, showing that it is highly coupled with the ground truth payoff. We benchmark two versions of our algorithms, αIG (Bins) and αIG (NSB), which differ in the employed entropy estimator. We compare to three baselines: RG-UCB, a frequentist bandit algorithm (Rowland et al. 2019), Payoff, which maximizes the information gain about the payoff distribution, and Uniform, which selects payoffs uniformly at random. |
| Researcher Affiliation | Collaboration | Tabish Rashid*,1 Cheng Zhang,2 Kamil Ciosek2 1University of Oxford 2Microsoft Research, Cambridge, UK tabish.rashid@cs.ox.ac.uk, {cheng.zhang, kamil.ciosek}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 αIG algorithm. αIG(NSB) and αIG(Bin) variants differ in entropy estimator (Line 7). |
| Open Source Code | Yes | Code is available at github.com/microsoft/Info Gainalpharank. |
| Open Datasets | No | To investigate our algorithm, we study two environments whose payoffs are shown in Figure 3. We start with the relatively simple environment with 4 agents. Figure 3 (Left) shows the expected payoffs, which we can interpret as the win-rate. Samples are drawn from a Bernoulli distribution with the appropriate mean. (The paper uses synthetic games with dynamically generated samples rather than a fixed public dataset with access information.) |
| Dataset Splits | No | The paper describes experiments on 'synthetic games' where 'Samples are drawn from a Bernoulli distribution'. It does not specify training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined splits, as data is generated dynamically. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | In our implementation, we use POT (Flamary and Courty 2017) to approximate this distance. (Refers to software but without specific version numbers for POT or any other dependencies.) |
| Experiment Setup | Yes | A detailed explanation of the experimental setup2 and details on the used hyperparameters are included in Appendix. |