reproducibilityindex.ai

Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits

Authors: Marc Jourdan, Rémy Degenne

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we propose an asymptotically optimal algorithm for this setting, which is shown to achieve competitive empirical performance against existing modified best-arm identification algorithms. We perform experiments to highlight the empirical performance of the modified BAI algorithms on additive εBAI problems. Moreover, we show that using zt z F (µt 1, Nt 1) in (5) achieves lower empirical stopping time compared to zt z (µt 1), and outperforms the ε-gap stopping rule with zt z (µt 1). We consider linear bandits, i.e. K = Z, with M = Rd and (ε, δ) = (0.05, 0.01), and perform 5000 runs.
Researcher Affiliation	Academia	Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9198CRISt AL, F-59000 Lille, France. Correspondence to: Marc Jourdan <marc.jourdan@inria.fr>.
Pseudocode	Yes	Algorithm 1 LεBAI Input: History Ft, Z-oracle LZ and learner LK. Output: Candidate ε-optimal answer ˆz. Pull once each arm a K, set n0 = K and Wn0 = 1K; for t = n0 + 1, do Get zt z F (µt 1, Nt 1); If (5) holds for zt then return zt; Get zt, w LK t from LZ LK; Let wt = 1K t w LK t and Wt = Wt 1 + wt; Closest alternative: λt arg minλ ε zt µt 1 λ 2 Vwt; Optimistic gains: a K, U a t = µt 1 λt aa T + pca t 1 2; Feed LK with gain gt(w) = (1 1 t ) w, Ut ; Pull at arg mina K N a t 1 W a t , observe Xat t ; end for
Open Source Code	Yes	To assess our code and reproduce the experiments presented in this paper, you need to unzip the provided code by running unzip code.zip. All the algorithms and experiments are implemented in Julia 1.6.3 (but also run on Julia 1.1.1). Plots were generated with the Stats Plots.jl package. Other dependencies are listed in the Readme.md. The Readme.md file provides detailed julia instructions to reproduce our experiments, as well as a script.sh to run them all at once. The general structure of the code (and some functions) is taken from the tidnabbil library.
Open Datasets	No	The paper describes the generation of synthetic datasets for experiments (e.g., "We adapt the usual hard instance...", "Random Instances... For the answer set, 19 vectors (ak)k [19] are uniformly drawn from Sd 1 def = a Rd : a 2 = 1 and set µ = a1."), but it does not provide specific links, DOIs, formal citations, or mention of established public benchmark datasets for these instances. The dataset is generated within the experimental setup itself, not provided as a separate, accessible resource.
Dataset Splits	No	The paper describes an experimental setup for an online bandit problem, focusing on
Hardware Specification	No	The Acknowledgements section states: "Experiments presented in this paper were carried out using the Grid 5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr)." While Grid 5000 is a computing infrastructure, the paper does not specify the exact hardware components (e.g., specific CPU/GPU models, memory sizes) used for the experiments.
Software Dependencies	Yes	All the algorithms and experiments are implemented in Julia 1.6.3 (but also run on Julia 1.1.1). Plots were generated with the Stats Plots.jl package. Other dependencies are listed in the Readme.md.
Experiment Setup	Yes	We consider linear bandits, i.e. K = Z, with M = Rd and (ε, δ) = (0.05, 0.01), and perform 5000 runs. The stopping-recommendation pair is updated at each time t. The discretization of the simplex 2 and 4 (with 500 and 10000 vectors) is obtained by drawing uniformly vectors in the simplex, in practice we used a Dirichlet distribution with parameters 1. We set the hyper-parameter of XY-Adaptive to 0.1 as done in Soare et al. (2014); Degenne et al. (2020a). It controls the length of each phase. Instead of the stopping threshold (6) supported by the theory, we use as heuristic β (t, δ) = 4 ln 4+ln(t/2) / δ, where we take the main term and plug in d instead of K.