Feel-Good Thompson Sampling for Contextual Dueling Bandits

Authors: Xuheng Li, Heyang Zhao, Quanquan Gu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we evaluate our algorithm on synthetic data and observe that FGTS.CDB outperforms existing algorithms by a large margin.
Researcher Affiliation Academia 1Department of Computer Science, University of California, Los Angeles, CA 90095, USA. Correspondence to: Quanquan Gu <qgu@cs.ucla.edu>.
Pseudocode Yes Algorithm 1 FGTS.CDB
Open Source Code No The paper describes the implementation of the algorithm in experiments ('In our experiment, we implement (8.1) with initial step size δ = 0.005'), but does not provide a specific link or explicit statement about the release of the source code for the methodology.
Open Datasets No The paper states 'We generate a total of |At| = 32 distinct arms with feature vectors randomly chosen from {±1}d following the uniform distribution.' This indicates synthetic data generation, not the use or provision of a publicly available dataset.
Dataset Splits No The paper conducts experiments for T=2500 rounds in a bandit setting, where data is observed sequentially. It does not describe traditional train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper describes experiments performed 'through simulation' and with 'synthetic data'. It does not specify any hardware components such as GPU models, CPU types, or memory amounts used for these simulations.
Software Dependencies No The paper mentions mathematical functions and algorithms (e.g., 'logistic function σ()', 'stochastic gradient Langevin dynamics (SGLD)'), but does not list specific software libraries or programming languages with version numbers required to reproduce the experiments.
Experiment Setup Yes For each experiment, we run T = 2500 rounds. The dimension of feature vectors is set to d = 5, 10, 15. Each experiment comprises 10 independent runs. η and µ is set to 1 and T 1/2 α, respectively. ... We implement (8.1) with initial step size δ = 0.005. After each round, we schedule the step size δ with decaying rate 0.99 to stabilize the optimization process. For the benchmarks, we select the hyperparameters, including the conficence radius in Max In P and Max Pair UCB and the magnitude of perturbations in Co LSTIM, to be the best-performing hyperparameter within {10−2, 10−1, 100, 101}.