Likelihood Ratio Confidence Sets for Sequential Decision Making

Authors: Nicolas Emmenegger, Mojmir Mutny, Andreas Krause

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We showcase the practical strength of our method on generalized linear bandit problems, survival analysis, and bandits with various additive noise distributions. 4 Application: Linear and Kernelized Bandits 4.2 Experimental Evaluation Figure 2: Bandit experiments: On the y-axis we report cumulative regret, while the x-axis shows the number of iterations. In a) and b) we report the results for linear models with different parametric additive noise. In c) we report the results on a survival analysis with a log-Weibull distribution (p = 2) and in d) we showcase Poisson bandits.
Researcher Affiliation Academia Nicolas Emmenegger ETH Zürich Mojmír Mutný ETH Zürich Andreas Krause ETH Zürich
Pseudocode Yes Algorithm 1 Constructing the LR Confidence Sequence 1: Input: convex set Θ Rd, confidence level α > 0, likelihood pθ(y|x), regularizers {ψt}t 2: for t N0 do 3: ˆθt = arg minθ Θ Pt 1 s=1 log pθ(ys | xs) + ψt(θ) FTRL ( 1/L 1/L+bias2 xt(ˆθt) THIS WORK 1 CLASSICAL BIAS-WEIGHTING biasxt(ˆθt) in Eq. (5) or Eq.(6) 5: Ct = θ Θ Qt s=1 pws ˆ θs (ys | xs) pws θ (ys | xs) 1 . Confidence set
Open Source Code No No explicit statement or link regarding the release of source code for the methodology described in this paper was found.
Open Datasets Yes The examples in Fig. 2 use the true payoff functions r(x) = (1.4 3x) sin(18x), which we model as an element of a RKHS with squared exponential kernel lengthscale γ = 6 10 2 on [0, 1.2], which is the baseline function no. 4 in the global optimization benchmark database infinity77 (Gavana, 2021). Gavana, A. (2021). infinity global optimization benchmarks and ampgo. http://infinity77. net/global_optimization/index.html#.
Dataset Splits No The paper does not provide specific details on dataset splits (e.g., percentages, sample counts, or explicit predefined split references) needed to reproduce the data partitioning for training, validation, and testing.
Hardware Specification No No specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running the experiments are provided in the paper.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments.
Experiment Setup Yes The examples in Fig. 2 use the true payoff functions r(x) = (1.4 3x) sin(18x), which we model as an element of a RKHS with squared exponential kernel lengthscale γ = 6 10 2 on [0, 1.2]... and We include such sets as a baseline without provable coverage as well. The main take-home message from the experiments is that among all the estimators and confidence sets that enjoy provable coverage, our confidence sets perform the best, on par with successful heuristics. For all our numerical experiments in Figure 2, the true payoff function is assumed to be an infinite dimensional RKHS element. For further details and experiments, please refer to App. E. and in both cases they are performing as good as heuristic confidence sets with confidence parameter βt 2 log(1/δ).