Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Authors: Joel Q. L. Chang, Vincent Y. F. Tan6159-6166

AAAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-a-vis algorithm-independent lower bounds. (...) Numerical Experiments We verify our theory via numerical experiments on ฯ-NPTS for new risk measures that are linear combinations of existing ones.
Researcher Affiliation Academia Joel Q. L. Chang1, Vincent Y. F. Tan1, 2 1Department of Mathematics, National University of Singapore 2Department of Electrical and Computer Engineering, National University of Singapore
Pseudocode Yes Algorithm 1: ฯ-MTS (...) Algorithm 2: ฯ-NPTS
Open Source Code Yes The Java code to reproduce the plots in Figure 2 can be found at tinyurl.com/unify Rho Ts.
Open Datasets No The paper uses simulated data based on specified probability distributions (Beta(1, 3), Beta(3, 3), Beta(3, 1)). It does not use or provide access information for a pre-existing publicly available dataset.
Dataset Splits No The paper describes its simulation setup, including the number of arms, time steps, and experiments, but it does not specify explicit train/validation/test dataset splits. The experiments involve simulating bandit processes rather than using fixed datasets with defined splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It only mentions 'Numerical Experiments'.
Software Dependencies No The paper mentions 'The Java code to reproduce the plots' but does not specify the version of Java or any other software libraries with their version numbers that are necessary for reproducibility.
Experiment Setup Yes We consider a 3-arm bandit instance (i.e., K = 3) with a horizon of n = 5, 000 time steps and over 50 experiments, where the arms 1, 2, 3 follow probability distributions Beta(1, 3), Beta(3, 3), Beta(3, 1) respectively. (...) De๏ฌne the risk functionals ฯ1 := MV0.5+CVa R0.95 and ฯ2 := Prop0.7 + LB0.6 on (P(B) c , DL), where we set (ฮณ, ฮฑ, p, q) = (0.5, 0.95, 0.7, 0.6) as the parameters for the mean-variance, CVa R, Proportional risk hazard, and Lookback components respectively (see Table 1).