Bayesian Design Principles for Frequentist Sequential Learning
Authors: Yunbei Xu, Assaf Zeevi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement Algorithm 4 (with the legend APS in the figures) in the stochastic, adversarial and non-stationary environments. We plot expected regret (average of 100 runs) for different choices of η, and set γ = 0.001 in all experiments. |
| Researcher Affiliation | Academia | Graduate School of Business, Columbia University, New York, New York, USA. Correspondence to: Yunbei Xu <yunbei.xu@gsb.columbia.edu>. |
| Pseudocode | Yes | Algorithm 1 Maximizing AIR to create algorithmic beliefs |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | No | We implement Algorithm 4 (with the legend APS in the figures) in the stochastic, adversarial and non-stationary environments. We plot expected regret (average of 100 runs) for different choices of η, and set γ = 0.001 in all experiments. |
| Dataset Splits | No | The paper describes running numerical experiments for multi-armed bandits and plots 'expected regret (average of 100 runs)', but it does not specify explicit train/validation/test dataset splits. In this context, data is generated interactively rather than being pre-partitioned. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used to conduct the numerical experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies, libraries, or solvers with their respective version numbers that were used to implement or run the experiments. |
| Experiment Setup | Yes | We plot expected regret (average of 100 runs) for different choices of η, and set γ = 0.001 in all experiments. We find APS 1) outperforms UCB and matches TS in the stochastic environment; 2) outperforms EXP3 in the adversarial environment; and 3) outperforms EXP3 and is comparable to the clairvoyant benchmarks (that have prior knowledge of the changes) in the non-stationary environment. For this reason we say Algorithm 4 (APS) achieves the best-of-all-worlds performance. We note that the optimized choice of η in APS differ instance by instance, but by an initial tuning we typically see good results, whether we tune η optimally or not optimally. |