Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits
Authors: Joel Q. L. Chang, Vincent Y. F. Tan6159-6166
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-a-vis algorithm-independent lower bounds. (...) Numerical Experiments We verify our theory via numerical experiments on ฯ-NPTS for new risk measures that are linear combinations of existing ones. |
| Researcher Affiliation | Academia | Joel Q. L. Chang1, Vincent Y. F. Tan1, 2 1Department of Mathematics, National University of Singapore 2Department of Electrical and Computer Engineering, National University of Singapore |
| Pseudocode | Yes | Algorithm 1: ฯ-MTS (...) Algorithm 2: ฯ-NPTS |
| Open Source Code | Yes | The Java code to reproduce the plots in Figure 2 can be found at tinyurl.com/unify Rho Ts. |
| Open Datasets | No | The paper uses simulated data based on specified probability distributions (Beta(1, 3), Beta(3, 3), Beta(3, 1)). It does not use or provide access information for a pre-existing publicly available dataset. |
| Dataset Splits | No | The paper describes its simulation setup, including the number of arms, time steps, and experiments, but it does not specify explicit train/validation/test dataset splits. The experiments involve simulating bandit processes rather than using fixed datasets with defined splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It only mentions 'Numerical Experiments'. |
| Software Dependencies | No | The paper mentions 'The Java code to reproduce the plots' but does not specify the version of Java or any other software libraries with their version numbers that are necessary for reproducibility. |
| Experiment Setup | Yes | We consider a 3-arm bandit instance (i.e., K = 3) with a horizon of n = 5, 000 time steps and over 50 experiments, where the arms 1, 2, 3 follow probability distributions Beta(1, 3), Beta(3, 3), Beta(3, 1) respectively. (...) De๏ฌne the risk functionals ฯ1 := MV0.5+CVa R0.95 and ฯ2 := Prop0.7 + LB0.6 on (P(B) c , DL), where we set (ฮณ, ฮฑ, p, q) = (0.5, 0.95, 0.7, 0.6) as the parameters for the mean-variance, CVa R, Proportional risk hazard, and Lookback components respectively (see Table 1). |