Feel-Good Thompson Sampling for Contextual Dueling Bandits
Authors: Xuheng Li, Heyang Zhao, Quanquan Gu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we evaluate our algorithm on synthetic data and observe that FGTS.CDB outperforms existing algorithms by a large margin. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California, Los Angeles, CA 90095, USA. Correspondence to: Quanquan Gu <qgu@cs.ucla.edu>. |
| Pseudocode | Yes | Algorithm 1 FGTS.CDB |
| Open Source Code | No | The paper describes the implementation of the algorithm in experiments ('In our experiment, we implement (8.1) with initial step size δ = 0.005'), but does not provide a specific link or explicit statement about the release of the source code for the methodology. |
| Open Datasets | No | The paper states 'We generate a total of |At| = 32 distinct arms with feature vectors randomly chosen from {±1}d following the uniform distribution.' This indicates synthetic data generation, not the use or provision of a publicly available dataset. |
| Dataset Splits | No | The paper conducts experiments for T=2500 rounds in a bandit setting, where data is observed sequentially. It does not describe traditional train/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper describes experiments performed 'through simulation' and with 'synthetic data'. It does not specify any hardware components such as GPU models, CPU types, or memory amounts used for these simulations. |
| Software Dependencies | No | The paper mentions mathematical functions and algorithms (e.g., 'logistic function σ()', 'stochastic gradient Langevin dynamics (SGLD)'), but does not list specific software libraries or programming languages with version numbers required to reproduce the experiments. |
| Experiment Setup | Yes | For each experiment, we run T = 2500 rounds. The dimension of feature vectors is set to d = 5, 10, 15. Each experiment comprises 10 independent runs. η and µ is set to 1 and T 1/2 α, respectively. ... We implement (8.1) with initial step size δ = 0.005. After each round, we schedule the step size δ with decaying rate 0.99 to stabilize the optimization process. For the benchmarks, we select the hyperparameters, including the conficence radius in Max In P and Max Pair UCB and the magnitude of perturbations in Co LSTIM, to be the best-performing hyperparameter within {10−2, 10−1, 100, 101}. |