Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to Identify Top Elo Ratings: A Dueling Bandits Approach
Authors: Xue Yan, Yali Du, Binxin Ru, Jun Wang, Haifeng Zhang, Xu Chen8797-8805
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that our method achieves superior convergence speed and time efficiency on a variety of gaming tasks. We consider the following two batteries of experiments to evaluate the performance of our algorithms in the scenarios of transitive and intransitive real world meta-games. |
| Researcher Affiliation | Academia | Xue Yan1,2, Yali Du * 3, Binxin Ru 4, Jun Wang 5, Haifeng Zhang1,2, Xu Chen6 1 Institute of Automation, Chinese Academy of Sciences, China 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, China 3 Department of Informatics, King s College London, UK 4 Machine Learning Research Group, University of Oxford, UK 5 Department of Computer Science, University College London, UK 6 Gaoling School of Artificial Intelligence, Renmin University of China, China |
| Pseudocode | Yes | Algorithm 1: Max In-Elo: Dueling bandits with online SGD for top player identification. |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about releasing the source code for the methodology described. |
| Open Datasets | Yes | We do our experiments on twelve real-games released by Czarnecki et al. (2020), most of which are implemented on the Open Spiel framework (Lanctot et al. 2019). |
| Dataset Splits | No | The paper mentions 'The batch size τ of Max In P, Max In Elo and Max In-m Elo is set to 0.7 n.' but does not specify training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits for the datasets used). |
| Hardware Specification | Yes | All experiments were run in a single x86 64 GNU/Linux machine with 256 AMD EPYC 7742 64-Core Processor and 2 A100 PCIe 40GB GPU. |
| Software Dependencies | Yes | We use sklearn(0.24.2) to solve the MLE. |
| Experiment Setup | Yes | Parameters setting For Random, DBGD, and RG-UCB baseline, we perform a grid search for the initial step size η in the range {0.01, 0.05, 0.1, 0.5, 1, 5, 10}. For RG-UCB, stopping confidence δ = 0.2. For Max In P, we tune the UCB balanced parameter γ {0.2, 0.4, 0.6, ..., 2.0}. For Max In Elo and Max In-m Elo, we tune the initialized learning rate η {0.01, 0.05, 0.1, 0.5, 1, 5, 10}, and the learning rate at batch j is set as η j . And the UCB balanced parameter γ {0.2, 0.4, 0.6, ..., 2.0}. The batch size τ of Max In P, Max In Elo and Max In-m Elo is set to 0.7 n. |