Neural Thompson Sampling

Authors: Weitong ZHANG, Dongruo Zhou, Lihong Li, Quanquan Gu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental comparisons with other benchmark bandit algorithms on various data sets corroborate our theory. and Finally, we corroborate the analysis with an empirical evaluation of the algorithm on several benchmarks. Experiments show that Neural TS yields competitive performance, in comparison with state-of-the-art baselines, thus suggest its practical value in addition to strong theoretical guarantees.
Researcher Affiliation Collaboration Weitong Zhang Department of Computer Science University of California, Los Angeles Los Angeles, CA, USA, 90095 wt.zhang@ucla.edu Dongruo Zhou Department of Computer Science University of California, Los Angeles Los Angeles, CA, USA, 90095 drzhou@cs.ucla.edu Lihong Li Google Research USA lihong@google.com Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA, USA, 90095 qgu@cs.ucla.edu
Pseudocode Yes Algorithm 1 Neural Thompson Sampling (Neural TS)
Open Source Code No The paper does not provide concrete access to source code for the methodology.
Open Datasets Yes This section gives an empirical evaluation of our algorithm in several public benchmark datasets, including adult, covertype, magic telescope, mushroom and shuttle, all from UCI (Dua & Graff, 2017), as well as MNIST (Le Cun et al., 2010).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits with percentages, absolute counts, or specific predefined splits. It mentions using public datasets and reshuffling data for repeated runs.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes One-hidden-layer neural networks with 100 neurons are used. Note that we do not choose m as suggested by theory, and such a disconnection has its root in the current deep learning theory based on neural tangent kernel, which is not specific in this work. During posterior updating, gradient descent is run for 100 iterations with learning rate 0.001. and We set the time horizon of our algorithm to 10 000 for all data sets, except for mushroom which contains only 8 124 data. and For the Neural UCB / Thompson Sampling methods, we use a grid search on λ {1, 10 1, 10 2, 10 3} and ν {10 1, 10 2, 10 3, 10 4, 10 5}.