Finite-Time Logarithmic Bayes Regret Upper Bounds

Authors: Alexia Atsidakou, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, Sujay Sanghavi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results are reported in Figure 1. We observe three major trends. First, the regret of Bayes UCB decreases as the problem becomes easier, either "σ0 → 0" or "θ0 → ∞". It is also lower than that of UCB1, which does not leverage the prior. Second, the regret bound of Bayes UCB is tighter than that of UCB1, due to capturing the benefit of the prior. Finally, the logarithmic regret bounds are tighter than the O(√n) bound. In addition, the O(√n) bound depends on the prior only through σ0 and thus remains constant as the prior gap θ0 changes.
Researcher Affiliation Collaboration Alexia Atsidakou University of Texas, Austin Branislav Kveton AWS AI Labs Sumeet Katariya Amazon Constantine Caramanis University of Texas, Austin Sujay Sanghavi University of Texas, Austin / Amazon
Pseudocode Yes Algorithm 1 Bayes UCB
Open Source Code No No explicit statement or link indicating that the source code for the methodology described in the paper is openly available.
Open Datasets No The paper describes synthetic experimental environments (Gaussian bandits and linear bandits) rather than using specific publicly available datasets with formal access information.
Dataset Splits No The paper describes simulation-based experiments and does not specify train/validation/test dataset splits. It mentions, “All results are averaged over 10 000 random runs.”
Hardware Specification No The paper states, “We experiment with UCB algorithms in two environments: Gaussian bandits (Section 3.1) and linear bandits with Gaussian rewards (Section 3.3). In both experiments, the horizon is n = 1 000 rounds. All results are averaged over 10 000 random runs.” No specific hardware details (e.g., GPU models, CPU types) used for running the experiments are provided.
Software Dependencies No The paper does not specify any software names with version numbers used for the experiments.
Experiment Setup Yes The first problem is a K-armed bandit with K = 10 actions (Section 3.1). The prior width is σ0 = 1. The prior mean is µ0, where µ0,1 = 0 and µ0,a = 0 for a > 1. We set θ0 = 1 and call it the prior gap. We vary σ0 and θ0 in our experiments...