Double Thompson Sampling for Dueling Bandits

Authors: Huasen Wu, Xin Liu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments based on both synthetic and real-world data demonstrate that D-TS and D-TS+ significantly improve the overall performance, in terms of regret and robustness.
Researcher Affiliation Academia Huasen Wu University of California, Davis hswu@ucdavis.edu Xin Liu University of California, Davis xinliu@ucdavis.edu
Pseudocode Yes Algorithm 1 D-TS for Copeland Dueling Bandits
Open Source Code Yes 2Source codes are available at https://github.com/Huasen Wu/Dueling Bandits.
Open Datasets Yes Here we present the results for experiments based on the Microsoft Learning to Rank (MSLR) dataset [24], which provides the relevance for queries and ranked documents. ... [24] Microsoft Research, Microsoft Learning to Rank Datasets. http://research.microsoft.com/enus/projects/mslr/, 2010.
Dataset Splits No The paper uses the Microsoft Learning to Rank (MSLR) dataset and refers to 'two 5-armed submatrices in [6]' but does not provide specific percentages or counts for training, validation, or test splits for their experiments.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes For BTM, we set the relaxed factor γ = 1.3 as [16]. For algorithms using RUCB and RLCB, including D-TS and D-TS+, we set the scale factor α = 0.51. For RMED1, we use the same settings as [5], and for ECW-RMED, we use the same setting as [7].