Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tackling Biased Evaluators in Dueling Bandits

Authors: Ming Tang, Yuxuan Zhou, Chao Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that when compared with baselines, our algorithms reduces the regret by up to 86.9%. ... 5 Experiments ... We compare our algorithms with five baselines in dueling bandits: Relative Confidence (denoted by 'RC') [10], Relative UCB (denoted by 'RUCB') [11], a Bayesian method Double Thompson Cumulative Average Regret ( ) Cumulative Weak Regret ( ) Arm Heter. σ2 Bias Concentr. αB Arm Heter. σ2 Bias Concentr. αB 1.0 2.0 4.0 1.0 2.0 3.0 1.0 2.0 4.0 1.0 2.0 3.0 RC 1374 1338 967 2845 1338 687 596 525 502 1847 525 278 RUCB 1906 2134 1154 2832 2134 1185 1018 1144 719 1829 1144 506 DT 1396 1425 942 2621 1425 640 445 492 375 1428 492 191 MBTW 1220 1509 726 1769 1509 1448 175 162 140 569 162 92 UCB 1283 1426 732 2581 1426 706 553 548 336 1583 548 153 RC-B(*) 1378 1611 1050 2207 1611 803 649 869 727 1119 869 502 RUCB-B(*) 993 1120 709 1191 1120 1055 422 480 370 604 480 446 DT-B(*) 430 411 344 631 411 280 198 210 168 436 210 110 BS-UN(*) 690 689 387 825 689 637 194 161 94 340 161 92 BS-K(*) 654 713 407 554 713 624 116 90 79 60 90 82 Table 1: Performance under diverse arm heterogeneity (denoted by 'heter.') and bias concentration (denoted by 'concentr.') with 10 arms and 10 evaluators. Our methods are marked with '(*)'.
Researcher Affiliation Academia Ming Tang Dept. of Computer Science and Engineering Southern Univ. of Science and Technology Shenzhen, Guangdong, China EMAIL Yuxuan Zhou Dept. of Mathematics Southern Univ. of Science and Technology Shenzhen, Guangdong, China EMAIL School of Computing Montclair State University Montclair, New Jersey, USA EMAIL
Pseudocode Yes Algorithm 1 Bias-Sensitive UCB Algorithm 1: for each time slot t = 1 to T do 2: Update ˆpij(t 1) using (10) and rij(t 1) using (11); 3: Estimate UCBi(t) using (13) for i K; 4: Select the arms x1(t) and x2(t) that optimize problem (14); 5: end for ... Algorithm 2 Extended Bias-Sensitive UCB Algorithm 1: for each time slot t = 1 to T do 2: Set ˆpij(t 1) = pm ij(t 1), compute ˆηm(t 1) using (18); 3: Repeat twice 4: Update ˆpij(t 1) using (16), update ˆηm(t 1) using (18); 5: Compute ˆrij(t 1) with the estimated bias ˆηm(t 1) using (19); 6: Substitute ˆpij(t 1) and ˆrij(t 1) into (12) and (13) to compute UCBi(t), i K; 7: Select the arms x1(t) and x2(t) that optimize problem (14); 8: end for
Open Source Code Yes Our code is built based on open source code [28] for dueling bandits. ... Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have included the code in supplementary material.
Open Datasets No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have included the code in supplementary material. The experiments in this work do not rely on datasets.
Dataset Splits No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have included the code in supplementary material. The experiments in this work do not rely on datasets.
Hardware Specification Yes Experiments are conducted on a compute platform with an AMD Ryzen 7 7800X3D (8-core) processor and 64 GB of RAM (4800 MHz).
Software Dependencies No Our code is built based on open source code [28] for dueling bandits. ... [28] Duel Py Documentation: Examples and Writing a New Algorithm. 2023. URL: https://duelpy.gitlab.io/duelpy/examples.html#writing-a-new-algorithm. Explanation: The paper mentions using 'duelpy' as a base for their code, but does not provide a specific version number for this or any other software dependency.
Experiment Setup Yes Unless otherwise specified, we set ηm Beta(αB = 2, βB = 1) and si N(µ = 0, σ2 = 2). Through empirical tests, we set α = α0(P m Mij(t)(2ηm 1)2)2/(P m Mij(t) |2ηm 1|)2, where α0 = 0.51 [11] and ηm can be the recent estimated value for unknown bias case. The term α relies on the recent estimation of ηm and helps to mitigate the over-exploration due to the presence of evaluators bias. We set coefficient c = 50.