reproducibilityindex.ai

Learning to Identify Top Elo Ratings: A Dueling Bandits Approach

Authors: Xue Yan, Yali Du, Binxin Ru, Jun Wang, Haifeng Zhang, Xu Chen8797-8805

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that our method achieves superior convergence speed and time efficiency on a variety of gaming tasks. We consider the following two batteries of experiments to evaluate the performance of our algorithms in the scenarios of transitive and intransitive real world meta-games.
Researcher Affiliation	Academia	Xue Yan1,2, Yali Du * 3, Binxin Ru 4, Jun Wang 5, Haifeng Zhang1,2, Xu Chen6 1 Institute of Automation, Chinese Academy of Sciences, China 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, China 3 Department of Informatics, King s College London, UK 4 Machine Learning Research Group, University of Oxford, UK 5 Department of Computer Science, University College London, UK 6 Gaoling School of Artificial Intelligence, Renmin University of China, China
Pseudocode	Yes	Algorithm 1: Max In-Elo: Dueling bandits with online SGD for top player identification.
Open Source Code	No	The paper does not provide a specific link or explicit statement about releasing the source code for the methodology described.
Open Datasets	Yes	We do our experiments on twelve real-games released by Czarnecki et al. (2020), most of which are implemented on the Open Spiel framework (Lanctot et al. 2019).
Dataset Splits	No	The paper mentions 'The batch size τ of Max In P, Max In Elo and Max In-m Elo is set to 0.7 n.' but does not specify training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits for the datasets used).
Hardware Specification	Yes	All experiments were run in a single x86 64 GNU/Linux machine with 256 AMD EPYC 7742 64-Core Processor and 2 A100 PCIe 40GB GPU.
Software Dependencies	Yes	We use sklearn(0.24.2) to solve the MLE.
Experiment Setup	Yes	Parameters setting For Random, DBGD, and RG-UCB baseline, we perform a grid search for the initial step size η in the range {0.01, 0.05, 0.1, 0.5, 1, 5, 10}. For RG-UCB, stopping confidence δ = 0.2. For Max In P, we tune the UCB balanced parameter γ {0.2, 0.4, 0.6, ..., 2.0}. For Max In Elo and Max In-m Elo, we tune the initialized learning rate η {0.01, 0.05, 0.1, 0.5, 1, 5, 10}, and the learning rate at batch j is set as η j . And the UCB balanced parameter γ {0.2, 0.4, 0.6, ..., 2.0}. The batch size τ of Max In P, Max In Elo and Max In-m Elo is set to 0.7 n.