Finite-Time Logarithmic Bayes Regret Upper Bounds
Authors: Alexia Atsidakou, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, Sujay Sanghavi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results are reported in Figure 1. We observe three major trends. First, the regret of Bayes UCB decreases as the problem becomes easier, either "σ0 → 0" or "θ0 → ∞". It is also lower than that of UCB1, which does not leverage the prior. Second, the regret bound of Bayes UCB is tighter than that of UCB1, due to capturing the benefit of the prior. Finally, the logarithmic regret bounds are tighter than the O(√n) bound. In addition, the O(√n) bound depends on the prior only through σ0 and thus remains constant as the prior gap θ0 changes. |
| Researcher Affiliation | Collaboration | Alexia Atsidakou University of Texas, Austin Branislav Kveton AWS AI Labs Sumeet Katariya Amazon Constantine Caramanis University of Texas, Austin Sujay Sanghavi University of Texas, Austin / Amazon |
| Pseudocode | Yes | Algorithm 1 Bayes UCB |
| Open Source Code | No | No explicit statement or link indicating that the source code for the methodology described in the paper is openly available. |
| Open Datasets | No | The paper describes synthetic experimental environments (Gaussian bandits and linear bandits) rather than using specific publicly available datasets with formal access information. |
| Dataset Splits | No | The paper describes simulation-based experiments and does not specify train/validation/test dataset splits. It mentions, “All results are averaged over 10 000 random runs.” |
| Hardware Specification | No | The paper states, “We experiment with UCB algorithms in two environments: Gaussian bandits (Section 3.1) and linear bandits with Gaussian rewards (Section 3.3). In both experiments, the horizon is n = 1 000 rounds. All results are averaged over 10 000 random runs.” No specific hardware details (e.g., GPU models, CPU types) used for running the experiments are provided. |
| Software Dependencies | No | The paper does not specify any software names with version numbers used for the experiments. |
| Experiment Setup | Yes | The first problem is a K-armed bandit with K = 10 actions (Section 3.1). The prior width is σ0 = 1. The prior mean is µ0, where µ0,1 = 0 and µ0,a = 0 for a > 1. We set θ0 = 1 and call it the prior gap. We vary σ0 and θ0 in our experiments... |