A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits
Authors: Joel Q. L. Chang, Vincent Y. F. Tan6159-6166
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-a-vis algorithm-independent lower bounds. (...) Numerical Experiments We verify our theory via numerical experiments on ρ-NPTS for new risk measures that are linear combinations of existing ones. |
| Researcher Affiliation | Academia | Joel Q. L. Chang1, Vincent Y. F. Tan1, 2 1Department of Mathematics, National University of Singapore 2Department of Electrical and Computer Engineering, National University of Singapore |
| Pseudocode | Yes | Algorithm 1: ρ-MTS (...) Algorithm 2: ρ-NPTS |
| Open Source Code | Yes | The Java code to reproduce the plots in Figure 2 can be found at tinyurl.com/unify Rho Ts. |
| Open Datasets | No | The paper uses simulated data based on specified probability distributions (Beta(1, 3), Beta(3, 3), Beta(3, 1)). It does not use or provide access information for a pre-existing publicly available dataset. |
| Dataset Splits | No | The paper describes its simulation setup, including the number of arms, time steps, and experiments, but it does not specify explicit train/validation/test dataset splits. The experiments involve simulating bandit processes rather than using fixed datasets with defined splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It only mentions 'Numerical Experiments'. |
| Software Dependencies | No | The paper mentions 'The Java code to reproduce the plots' but does not specify the version of Java or any other software libraries with their version numbers that are necessary for reproducibility. |
| Experiment Setup | Yes | We consider a 3-arm bandit instance (i.e., K = 3) with a horizon of n = 5, 000 time steps and over 50 experiments, where the arms 1, 2, 3 follow probability distributions Beta(1, 3), Beta(3, 3), Beta(3, 1) respectively. (...) Define the risk functionals ρ1 := MV0.5+CVa R0.95 and ρ2 := Prop0.7 + LB0.6 on (P(B) c , DL), where we set (γ, α, p, q) = (0.5, 0.95, 0.7, 0.6) as the parameters for the mean-variance, CVa R, Proportional risk hazard, and Lookback components respectively (see Table 1). |