A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Authors: Joel Q. L. Chang, Vincent Y. F. Tan6159-6166

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-a-vis algorithm-independent lower bounds. (...) Numerical Experiments We verify our theory via numerical experiments on ρ-NPTS for new risk measures that are linear combinations of existing ones.
Researcher Affiliation Academia Joel Q. L. Chang1, Vincent Y. F. Tan1, 2 1Department of Mathematics, National University of Singapore 2Department of Electrical and Computer Engineering, National University of Singapore
Pseudocode Yes Algorithm 1: ρ-MTS (...) Algorithm 2: ρ-NPTS
Open Source Code Yes The Java code to reproduce the plots in Figure 2 can be found at tinyurl.com/unify Rho Ts.
Open Datasets No The paper uses simulated data based on specified probability distributions (Beta(1, 3), Beta(3, 3), Beta(3, 1)). It does not use or provide access information for a pre-existing publicly available dataset.
Dataset Splits No The paper describes its simulation setup, including the number of arms, time steps, and experiments, but it does not specify explicit train/validation/test dataset splits. The experiments involve simulating bandit processes rather than using fixed datasets with defined splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It only mentions 'Numerical Experiments'.
Software Dependencies No The paper mentions 'The Java code to reproduce the plots' but does not specify the version of Java or any other software libraries with their version numbers that are necessary for reproducibility.
Experiment Setup Yes We consider a 3-arm bandit instance (i.e., K = 3) with a horizon of n = 5, 000 time steps and over 50 experiments, where the arms 1, 2, 3 follow probability distributions Beta(1, 3), Beta(3, 3), Beta(3, 1) respectively. (...) Define the risk functionals ρ1 := MV0.5+CVa R0.95 and ρ2 := Prop0.7 + LB0.6 on (P(B) c , DL), where we set (γ, α, p, q) = (0.5, 0.95, 0.7, 0.6) as the parameters for the mean-variance, CVa R, Proportional risk hazard, and Lookback components respectively (see Table 1).