Double Thompson Sampling for Dueling Bandits
Authors: Huasen Wu, Xin Liu
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments based on both synthetic and real-world data demonstrate that D-TS and D-TS+ significantly improve the overall performance, in terms of regret and robustness. |
| Researcher Affiliation | Academia | Huasen Wu University of California, Davis hswu@ucdavis.edu Xin Liu University of California, Davis xinliu@ucdavis.edu |
| Pseudocode | Yes | Algorithm 1 D-TS for Copeland Dueling Bandits |
| Open Source Code | Yes | 2Source codes are available at https://github.com/Huasen Wu/Dueling Bandits. |
| Open Datasets | Yes | Here we present the results for experiments based on the Microsoft Learning to Rank (MSLR) dataset [24], which provides the relevance for queries and ranked documents. ... [24] Microsoft Research, Microsoft Learning to Rank Datasets. http://research.microsoft.com/enus/projects/mslr/, 2010. |
| Dataset Splits | No | The paper uses the Microsoft Learning to Rank (MSLR) dataset and refers to 'two 5-armed submatrices in [6]' but does not provide specific percentages or counts for training, validation, or test splits for their experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | For BTM, we set the relaxed factor γ = 1.3 as [16]. For algorithms using RUCB and RLCB, including D-TS and D-TS+, we set the scale factor α = 0.51. For RMED1, we use the same settings as [5], and for ECW-RMED, we use the same setting as [7]. |