Bayesian Design Principles for Frequentist Sequential Learning

Authors: Yunbei Xu, Assaf Zeevi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement Algorithm 4 (with the legend APS in the figures) in the stochastic, adversarial and non-stationary environments. We plot expected regret (average of 100 runs) for different choices of η, and set γ = 0.001 in all experiments.
Researcher Affiliation Academia Graduate School of Business, Columbia University, New York, New York, USA. Correspondence to: Yunbei Xu <yunbei.xu@gsb.columbia.edu>.
Pseudocode Yes Algorithm 1 Maximizing AIR to create algorithmic beliefs
Open Source Code No The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include a link to a code repository.
Open Datasets No We implement Algorithm 4 (with the legend APS in the figures) in the stochastic, adversarial and non-stationary environments. We plot expected regret (average of 100 runs) for different choices of η, and set γ = 0.001 in all experiments.
Dataset Splits No The paper describes running numerical experiments for multi-armed bandits and plots 'expected regret (average of 100 runs)', but it does not specify explicit train/validation/test dataset splits. In this context, data is generated interactively rather than being pre-partitioned.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used to conduct the numerical experiments.
Software Dependencies No The paper does not specify any software dependencies, libraries, or solvers with their respective version numbers that were used to implement or run the experiments.
Experiment Setup Yes We plot expected regret (average of 100 runs) for different choices of η, and set γ = 0.001 in all experiments. We find APS 1) outperforms UCB and matches TS in the stochastic environment; 2) outperforms EXP3 in the adversarial environment; and 3) outperforms EXP3 and is comparable to the clairvoyant benchmarks (that have prior knowledge of the changes) in the non-stationary environment. For this reason we say Algorithm 4 (APS) achieves the best-of-all-worlds performance. We note that the optimized choice of η in APS differ instance by instance, but by an initial tuning we typically see good results, whether we tune η optimally or not optimally.