Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits
Authors: Marc Jourdan, Rémy Degenne
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we propose an asymptotically optimal algorithm for this setting, which is shown to achieve competitive empirical performance against existing modified best-arm identification algorithms. We perform experiments to highlight the empirical performance of the modified BAI algorithms on additive εBAI problems. Moreover, we show that using zt z F (µt 1, Nt 1) in (5) achieves lower empirical stopping time compared to zt z (µt 1), and outperforms the ε-gap stopping rule with zt z (µt 1). We consider linear bandits, i.e. K = Z, with M = Rd and (ε, δ) = (0.05, 0.01), and perform 5000 runs. |
| Researcher Affiliation | Academia | Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9198CRISt AL, F-59000 Lille, France. Correspondence to: Marc Jourdan <marc.jourdan@inria.fr>. |
| Pseudocode | Yes | Algorithm 1 LεBAI Input: History Ft, Z-oracle LZ and learner LK. Output: Candidate ε-optimal answer ˆz. Pull once each arm a K, set n0 = K and Wn0 = 1K; for t = n0 + 1, do Get zt z F (µt 1, Nt 1); If (5) holds for zt then return zt; Get zt, w LK t from LZ LK; Let wt = 1K t w LK t and Wt = Wt 1 + wt; Closest alternative: λt arg minλ ε zt µt 1 λ 2 Vwt; Optimistic gains: a K, U a t = µt 1 λt aa T + pca t 1 2; Feed LK with gain gt(w) = (1 1 t ) w, Ut ; Pull at arg mina K N a t 1 W a t , observe Xat t ; end for |
| Open Source Code | Yes | To assess our code and reproduce the experiments presented in this paper, you need to unzip the provided code by running unzip code.zip. All the algorithms and experiments are implemented in Julia 1.6.3 (but also run on Julia 1.1.1). Plots were generated with the Stats Plots.jl package. Other dependencies are listed in the Readme.md. The Readme.md file provides detailed julia instructions to reproduce our experiments, as well as a script.sh to run them all at once. The general structure of the code (and some functions) is taken from the tidnabbil library. |
| Open Datasets | No | The paper describes the generation of synthetic datasets for experiments (e.g., "We adapt the usual hard instance...", "Random Instances... For the answer set, 19 vectors (ak)k [19] are uniformly drawn from Sd 1 def = a Rd : a 2 = 1 and set µ = a1."), but it does not provide specific links, DOIs, formal citations, or mention of established public benchmark datasets for these instances. The dataset is generated within the experimental setup itself, not provided as a separate, accessible resource. |
| Dataset Splits | No | The paper describes an experimental setup for an online bandit problem, focusing on |
| Hardware Specification | No | The Acknowledgements section states: "Experiments presented in this paper were carried out using the Grid 5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr)." While Grid 5000 is a computing infrastructure, the paper does not specify the exact hardware components (e.g., specific CPU/GPU models, memory sizes) used for the experiments. |
| Software Dependencies | Yes | All the algorithms and experiments are implemented in Julia 1.6.3 (but also run on Julia 1.1.1). Plots were generated with the Stats Plots.jl package. Other dependencies are listed in the Readme.md. |
| Experiment Setup | Yes | We consider linear bandits, i.e. K = Z, with M = Rd and (ε, δ) = (0.05, 0.01), and perform 5000 runs. The stopping-recommendation pair is updated at each time t. The discretization of the simplex 2 and 4 (with 500 and 10000 vectors) is obtained by drawing uniformly vectors in the simplex, in practice we used a Dirichlet distribution with parameters 1. We set the hyper-parameter of XY-Adaptive to 0.1 as done in Soare et al. (2014); Degenne et al. (2020a). It controls the length of each phase. Instead of the stopping threshold (6) supported by the theory, we use as heuristic β (t, δ) = 4 ln 4+ln(t/2) / δ, where we take the main term and plug in d instead of K. |