Learning to Identify Top Elo Ratings: A Dueling Bandits Approach
Authors: Xue Yan, Yali Du, Binxin Ru, Jun Wang, Haifeng Zhang, Xu Chen8797-8805
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that our method achieves superior convergence speed and time efficiency on a variety of gaming tasks. We consider the following two batteries of experiments to evaluate the performance of our algorithms in the scenarios of transitive and intransitive real world meta-games. |
| Researcher Affiliation | Academia | Xue Yan1,2, Yali Du * 3, Binxin Ru 4, Jun Wang 5, Haifeng Zhang1,2, Xu Chen6 1 Institute of Automation, Chinese Academy of Sciences, China 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, China 3 Department of Informatics, King s College London, UK 4 Machine Learning Research Group, University of Oxford, UK 5 Department of Computer Science, University College London, UK 6 Gaoling School of Artificial Intelligence, Renmin University of China, China |
| Pseudocode | Yes | Algorithm 1: Max In-Elo: Dueling bandits with online SGD for top player identification. |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about releasing the source code for the methodology described. |
| Open Datasets | Yes | We do our experiments on twelve real-games released by Czarnecki et al. (2020), most of which are implemented on the Open Spiel framework (Lanctot et al. 2019). |
| Dataset Splits | No | The paper mentions 'The batch size τ of Max In P, Max In Elo and Max In-m Elo is set to 0.7 n.' but does not specify training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits for the datasets used). |
| Hardware Specification | Yes | All experiments were run in a single x86 64 GNU/Linux machine with 256 AMD EPYC 7742 64-Core Processor and 2 A100 PCIe 40GB GPU. |
| Software Dependencies | Yes | We use sklearn(0.24.2) to solve the MLE. |
| Experiment Setup | Yes | Parameters setting For Random, DBGD, and RG-UCB baseline, we perform a grid search for the initial step size η in the range {0.01, 0.05, 0.1, 0.5, 1, 5, 10}. For RG-UCB, stopping confidence δ = 0.2. For Max In P, we tune the UCB balanced parameter γ {0.2, 0.4, 0.6, ..., 2.0}. For Max In Elo and Max In-m Elo, we tune the initialized learning rate η {0.01, 0.05, 0.1, 0.5, 1, 5, 10}, and the learning rate at batch j is set as η j . And the UCB balanced parameter γ {0.2, 0.4, 0.6, ..., 2.0}. The batch size τ of Max In P, Max In Elo and Max In-m Elo is set to 0.7 n. |