Thompson Sampling Algorithms for Mean-Variance Bandits

Authors: Qiuyu Zhu, Vincent Tan

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive set of simulations: We provide extensive sets of simulations for both Gaussian and Bernoulli bandits to show that our algorithms outperform state-of-the-art algorithms for mean-variance bandits.
Researcher Affiliation Academia 1Institute of Operations Research and Analytics, National University of Singapore, Singapore 2Department of Electrical and Computer Engineering, National University of Singapore, Singapore 3Department of Mathematics, National University of Singapore, Singapore.
Pseudocode Yes Algorithm 1 Update (ˆµi,t 1, Ti,t 1, αi,t 1, βi,t 1) ... Algorithm 2 Thompson Sampling for Mean Learning (MTS) and Variance Learning (VTS) ... Algorithm 3 Thompson Sampling for Gaussian mean-variance bandits (MVTS) ... Algorithm 4 Thompson Sampling for Bernoulli mean-variance bandits (BMVTS)
Open Source Code Yes The R code for all our experiments is provided along with this submission.
Open Datasets Yes The K = 15 Gaussian arms are set to the same as the experiments from Sani et al. (2012) (i.e. µ = (0.1, 0.2, 0.23, 0.27, 0.32, 0.32, 0.34, 0.41, 0.43, 0.54, 0.55, 0.56, 0.67, 0.71, 0.79), σ2 i = (0.05, 0.34, 0.28, 0.09, 0.23, 0.72, 0.19, 0.14, 0.44, 0.53, 0.24, 0.36, 0.56, 0.49, 0.85)).
Dataset Splits No The paper discusses the time horizon and number of runs for simulations ('The time horizon n = 30, 000 is fixed and the regret is averaged over 500 runs.') but does not specify explicit training, validation, or test dataset splits or cross-validation methodology.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions 'The R code for all our experiments is provided along with this submission.' but does not specify the version of R or any specific R packages/libraries with their version numbers.
Experiment Setup Yes The K = 15 Gaussian arms are set to the same as the experiments from Sani et al. (2012) (i.e. µ = (0.1, 0.2, 0.23, 0.27, 0.32, 0.32, 0.34, 0.41, 0.43, 0.54, 0.55, 0.56, 0.67, 0.71, 0.79), σ2 i = (0.05, 0.34, 0.28, 0.09, 0.23, 0.72, 0.19, 0.14, 0.44, 0.53, 0.24, 0.36, 0.56, 0.49, 0.85)). The time horizon n = 30, 000 is fixed and the regret is averaged over 500 runs.