Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Online Clustering of Dueling Bandits

Authors: Zhiyong Wang, Jiahang Sun, Mingze Kong, Jize Xie, Qinghua Hu, John C.S. Lui, Zhongxiang Dai

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct rigorous theoretical analysis for both our COLDB and CONDB algorithms, and our theoretical results demonstrate that the regret upper bounds of both algorithms are sub-linear and that a larger degree of user collaboration (i.e., when a larger number of users belong to the same cluster on average) leads to theoretically guaranteed improvement (Sec. 4). In addition, we also perform both synthetic and real-world experiments to demonstrate the practical advantage of our algorithms and the benefit of user collaboration in contextual MAB problems with preference feedback (Sec. 5). Extensive empirical evaluations on synthetic and real-world datasets further validate the effectiveness of our methods, establishing their potential in real-world applications involving multiple users with preference-based feedback.
Researcher Affiliation	Academia	1The Chinese University of Hong Kong 2Tongji University 3The Chinese University of Hong Kong, Shenzhen 4Hong Kong University of Science and Technology 5Tianjin University. Correspondence to: Zhongxiang Dai <EMAIL>.
Pseudocode	Yes	Algorithm 1 Clustering Of Linear Dueling Bandits (COLDB) ... Algorithm 2 Clustering Of Neural Dueling Bandits (CONDB)
Open Source Code	No	The paper does not contain any explicit statement about open-sourcing the code, nor does it provide any links to a code repository or mention code in supplementary materials.
Open Datasets	Yes	In the experiment with the Movie Lens dataset (Harper & Konstan, 2015), we follow the experimental setting from Wang et al. (2024a), a setting with 200 users.
Dataset Splits	No	In our synthetic experiment for COLDB, we design a setting with linear reward functions: fi(x) = θ i x. We choose u = 200 users, K = 20 arms and a feature dimension of d = 20, and construct two settings with m = 2 and m = 5 groundtruth clusters, respectively. In the experiment with the Movie Lens dataset (Harper & Konstan, 2015), we follow the experimental setting from Wang et al. (2024a), a setting with 200 users. Same as the synthetic experiment, we choose the number of arms in every round to be K = 20 and let the input feature dimension be d = 20. We construct a setting with m = 5 clusters. The paper describes dataset characteristics and experimental parameters but does not specify how the Movie Lens or synthetic datasets were split into training, validation, or test sets.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud computing instances) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or their version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup	Yes	In our synthetic experiment for COLDB, we design a setting with linear reward functions: fi(x) = θ i x. We choose u = 200 users, K = 20 arms and a feature dimension of d = 20, and construct two settings with m = 2 and m = 5 groundtruth clusters, respectively. ... Input: f(Ti,t) = 2 log(u/δ)+d log(1+4Ti,tκµ/dλ) 2 λx Ti,t , regularization parameter λ > 0, confidence parameter 2 log(1/δ) + d log (1 + t L2κµ/(dλ)), κµ > 0.