reproducibilityindex.ai

Double Thompson Sampling for Dueling Bandits

Authors: Huasen Wu, Xin Liu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments based on both synthetic and real-world data demonstrate that D-TS and D-TS+ signiﬁcantly improve the overall performance, in terms of regret and robustness.
Researcher Affiliation	Academia	Huasen Wu University of California, Davis hswu@ucdavis.edu Xin Liu University of California, Davis xinliu@ucdavis.edu
Pseudocode	Yes	Algorithm 1 D-TS for Copeland Dueling Bandits
Open Source Code	Yes	2Source codes are available at https://github.com/Huasen Wu/Dueling Bandits.
Open Datasets	Yes	Here we present the results for experiments based on the Microsoft Learning to Rank (MSLR) dataset [24], which provides the relevance for queries and ranked documents. ... [24] Microsoft Research, Microsoft Learning to Rank Datasets. http://research.microsoft.com/enus/projects/mslr/, 2010.
Dataset Splits	No	The paper uses the Microsoft Learning to Rank (MSLR) dataset and refers to 'two 5-armed submatrices in [6]' but does not provide specific percentages or counts for training, validation, or test splits for their experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	For BTM, we set the relaxed factor γ = 1.3 as [16]. For algorithms using RUCB and RLCB, including D-TS and D-TS+, we set the scale factor α = 0.51. For RMED1, we use the same settings as [5], and for ECW-RMED, we use the same setting as [7].