Likelihood Ratio Confidence Sets for Sequential Decision Making
Authors: Nicolas Emmenegger, Mojmir Mutny, Andreas Krause
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We showcase the practical strength of our method on generalized linear bandit problems, survival analysis, and bandits with various additive noise distributions. 4 Application: Linear and Kernelized Bandits 4.2 Experimental Evaluation Figure 2: Bandit experiments: On the y-axis we report cumulative regret, while the x-axis shows the number of iterations. In a) and b) we report the results for linear models with different parametric additive noise. In c) we report the results on a survival analysis with a log-Weibull distribution (p = 2) and in d) we showcase Poisson bandits. |
| Researcher Affiliation | Academia | Nicolas Emmenegger ETH Zürich Mojmír Mutný ETH Zürich Andreas Krause ETH Zürich |
| Pseudocode | Yes | Algorithm 1 Constructing the LR Confidence Sequence 1: Input: convex set Θ Rd, confidence level α > 0, likelihood pθ(y|x), regularizers {ψt}t 2: for t N0 do 3: ˆθt = arg minθ Θ Pt 1 s=1 log pθ(ys | xs) + ψt(θ) FTRL ( 1/L 1/L+bias2 xt(ˆθt) THIS WORK 1 CLASSICAL BIAS-WEIGHTING biasxt(ˆθt) in Eq. (5) or Eq.(6) 5: Ct = θ Θ Qt s=1 pws ˆ θs (ys | xs) pws θ (ys | xs) 1 . Confidence set |
| Open Source Code | No | No explicit statement or link regarding the release of source code for the methodology described in this paper was found. |
| Open Datasets | Yes | The examples in Fig. 2 use the true payoff functions r(x) = (1.4 3x) sin(18x), which we model as an element of a RKHS with squared exponential kernel lengthscale γ = 6 10 2 on [0, 1.2], which is the baseline function no. 4 in the global optimization benchmark database infinity77 (Gavana, 2021). Gavana, A. (2021). infinity global optimization benchmarks and ampgo. http://infinity77. net/global_optimization/index.html#. |
| Dataset Splits | No | The paper does not provide specific details on dataset splits (e.g., percentages, sample counts, or explicit predefined split references) needed to reproduce the data partitioning for training, validation, and testing. |
| Hardware Specification | No | No specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | The examples in Fig. 2 use the true payoff functions r(x) = (1.4 3x) sin(18x), which we model as an element of a RKHS with squared exponential kernel lengthscale γ = 6 10 2 on [0, 1.2]... and We include such sets as a baseline without provable coverage as well. The main take-home message from the experiments is that among all the estimators and confidence sets that enjoy provable coverage, our confidence sets perform the best, on par with successful heuristics. For all our numerical experiments in Figure 2, the true payoff function is assumed to be an infinite dimensional RKHS element. For further details and experiments, please refer to App. E. and in both cases they are performing as good as heuristic confidence sets with confidence parameter βt 2 log(1/δ). |