Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits
Authors: Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, Julien Seznec
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we report numerical simulations performed on synthetic data to compare the performance of GLR-kl UCB against other state-of-the-art approaches. Experiments were performed with a library written in the Julia language which is available online.4 |
| Researcher Affiliation | Collaboration | Lilian Besson EMAIL ENS Rennes, IRISA, Inria Rennes, France Emilie Kaufmann EMAIL Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France Odalric-Ambrym Maillard EMAIL Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France Julien Seznec EMAIL Inria and Lelivrescolaire.fr Editions |
| Pseudocode | Yes | Algorithm 1: GLR-kl UCB (Local or Global restarts) |
| Open Source Code | Yes | Experiments were performed with a library written in the Julia language which is available online.4 4 https://github.com/EmilieKaufmann/PiecewiseStationaryBandits |
| Open Datasets | No | We design two simple piecewise stationary bandit problems with A = 3 arms and ΥT = 4 breakpoints. To generate a random instance, we specify the number of arms A, the maximal number of breakpoints Υ, a change-point probability p, a minimal distance dmin, a minimal and maximal amount of change, min and max. Then, we sample the breakpoints uniformly at random under the constraint that τ k τ k 1 dmin. For each breakpoint k, each arm has a probability p to experience a change-point, whose magnitude is chosen uniformly at random in [ min, max]. |
| Dataset Splits | No | The paper primarily uses synthetic data generated according to specified parameters (e.g., number of arms, breakpoints, magnitudes) for different values of the horizon T. It does not involve traditional dataset splits like training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments or simulations. |
| Software Dependencies | No | Experiments were performed with a library written in the Julia language which is available online.4 While the programming language 'Julia' is mentioned, no specific version number or other software dependencies with their versions are provided for reproducibility. |
| Experiment Setup | Yes | For all the algorithms, we performed the tuning recommended in the corresponding paper, using in particular the knowledge of the number of breakpoints ΥT and the horizon T when needed. Only two algorithms do not require the knowledge of ΥT : Ad Switch and GLR-kl UCB. We experiment with Exp3.S (with theoretically optimal tuning in Corollary 8.3 of Auer et al. (2002b)), and the two passively adaptive algorithms Discounted kl UCB (D-kl UCB) with discount factor γ = 1 p ΥT /T/4 and Sliding-Window kl UCB (SW-kl UCB) with window-size τ = 2 p T ln(T)/ΥT ... we set α = p ΥT A ln(T)/T for all algorithms using a constant exploration probability and αk = p k A ln(T)/T for the exploration sequence of GLR-kl UCB. Regarding the parameters of the change-point detectors, we use a threshold h = ln(T/ΥT ) for CUSUM-kl UCB, as recommended by Liu et al. (2018), and experience with different values of (M, ϵ) that have to be tuned using some prior knowledge of the problem. For M-kl UCB, we experiment with different values of the windows parameter w (often choosing the tuning w = 800 that was found to be robust in the experiments of Cao et al. (2019)) and use the recommended threshold w ln(2AT 2)/2. For the Bernoulli-GLR test, we use the threshold function β(n, δ) = ln(n3/2/δ) and set δ = 1/ T, which is the largest value licensed by Corollary 6. For GLR-kl UCB and Ad Switch, which are computationally more demanding due to the tests based on scan-statistics, we use some implementation tweaks. For GLR-kl UCB, we use some down-sampling as discussed in Section 3.3, performing the test only every t = 10 time steps and scanning every s = 5 observations for a possible change-point. To be able to implement Ad Switch up to a horizon T = 5000, we set t = 50. ... Finally, the parameter C1 that governs the elimination of good arms and should be chosen large enough was set to C1 = 1. |