reproducibilityindex.ai

Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling

Authors: Aadirupa Saha, Branislav Kveton

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we corroborate our theory with experiments, which demonstrate the benefit of our variance-adaptive Bayesian algorithm over prior frequentist works. We also show that our approach is robust to model misspecification and can be applied with estimated priors.
Researcher Affiliation	Industry	Aadirupa Saha1 , Branislav Kveton2 1Apple, 2Amazon
Pseudocode	Yes	Our algorithm is presented in Algorithm 1 (Appendix A) and we call it Gaussian TS... The complete pseudocode of Algorithm 1 is given in Appendix A. Our algorithm is presented in Algorithm 2 and we call it Var TS... The complete pseudocode Algorithm 2 is given in Appendix C.
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper describes synthetic datasets generated from specific distributions (e.g., Bernoulli, Beta, Gaussian bandits with parameters) for its experiments, but it does not provide concrete access information (link, DOI, repository, or citation to an established public dataset).
Dataset Splits	No	The paper describes an online learning setting for bandit problems, which involves sequential interaction and reward observation rather than predefined dataset splits. Therefore, it does not provide explicit training/validation/test splits.
Hardware Specification	No	The paper does not specify any hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	In all simulations, the horizon is n = 2 000 and they are averaged over 1 000 randomly initialized runs. We start with a Bernoulli bandit with K = 10 arms. The mean reward of arm i [K], µi, is sampled i.i.d. from prior Beta(i, K + 1 i). The third experiment is with a Gaussian bandit where both the means and variances of rewards are sampled i.i.d. from a prior with parameters µ0,i = i/(K + 1), κ0,i = K, α0,i = 4, and β0,i = 1. For a given Bayesian bandit, let µi and vi be the estimated mean and variance of the mean reward of arm i sampled from its prior, respectively. Moreover, let λi and νi be the estimated mean and variance of the precision of the reward distribution of arm i sampled from its prior, respectively. Then, using these statistics, we estimate the unknown hyper-parameters of the prior as µ0,i = µi, β0,i = λi/νi, α0,i = β0,i/ λi, and κ0,i = β0,i/(α0,ivi).