Thompson Sampling Algorithms for Mean-Variance Bandits
Authors: Qiuyu Zhu, Vincent Tan
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive set of simulations: We provide extensive sets of simulations for both Gaussian and Bernoulli bandits to show that our algorithms outperform state-of-the-art algorithms for mean-variance bandits. |
| Researcher Affiliation | Academia | 1Institute of Operations Research and Analytics, National University of Singapore, Singapore 2Department of Electrical and Computer Engineering, National University of Singapore, Singapore 3Department of Mathematics, National University of Singapore, Singapore. |
| Pseudocode | Yes | Algorithm 1 Update (ˆµi,t 1, Ti,t 1, αi,t 1, βi,t 1) ... Algorithm 2 Thompson Sampling for Mean Learning (MTS) and Variance Learning (VTS) ... Algorithm 3 Thompson Sampling for Gaussian mean-variance bandits (MVTS) ... Algorithm 4 Thompson Sampling for Bernoulli mean-variance bandits (BMVTS) |
| Open Source Code | Yes | The R code for all our experiments is provided along with this submission. |
| Open Datasets | Yes | The K = 15 Gaussian arms are set to the same as the experiments from Sani et al. (2012) (i.e. µ = (0.1, 0.2, 0.23, 0.27, 0.32, 0.32, 0.34, 0.41, 0.43, 0.54, 0.55, 0.56, 0.67, 0.71, 0.79), σ2 i = (0.05, 0.34, 0.28, 0.09, 0.23, 0.72, 0.19, 0.14, 0.44, 0.53, 0.24, 0.36, 0.56, 0.49, 0.85)). |
| Dataset Splits | No | The paper discusses the time horizon and number of runs for simulations ('The time horizon n = 30, 000 is fixed and the regret is averaged over 500 runs.') but does not specify explicit training, validation, or test dataset splits or cross-validation methodology. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'The R code for all our experiments is provided along with this submission.' but does not specify the version of R or any specific R packages/libraries with their version numbers. |
| Experiment Setup | Yes | The K = 15 Gaussian arms are set to the same as the experiments from Sani et al. (2012) (i.e. µ = (0.1, 0.2, 0.23, 0.27, 0.32, 0.32, 0.34, 0.41, 0.43, 0.54, 0.55, 0.56, 0.67, 0.71, 0.79), σ2 i = (0.05, 0.34, 0.28, 0.09, 0.23, 0.72, 0.19, 0.14, 0.44, 0.53, 0.24, 0.36, 0.56, 0.49, 0.85)). The time horizon n = 30, 000 is fixed and the regret is averaged over 500 runs. |