Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

Authors: Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, Julien Seznec

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we report numerical simulations performed on synthetic data to compare the performance of GLR-kl UCB against other state-of-the-art approaches. Experiments were performed with a library written in the Julia language which is available online.4
Researcher Affiliation	Collaboration	Lilian Besson EMAIL ENS Rennes, IRISA, Inria Rennes, France Emilie Kaufmann EMAIL Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France Odalric-Ambrym Maillard EMAIL Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France Julien Seznec EMAIL Inria and Lelivrescolaire.fr Editions
Pseudocode	Yes	Algorithm 1: GLR-kl UCB (Local or Global restarts)
Open Source Code	Yes	Experiments were performed with a library written in the Julia language which is available online.4 4 https://github.com/EmilieKaufmann/PiecewiseStationaryBandits
Open Datasets	No	We design two simple piecewise stationary bandit problems with A = 3 arms and ΥT = 4 breakpoints. To generate a random instance, we specify the number of arms A, the maximal number of breakpoints Υ, a change-point probability p, a minimal distance dmin, a minimal and maximal amount of change, min and max. Then, we sample the breakpoints uniformly at random under the constraint that τ k τ k 1 dmin. For each breakpoint k, each arm has a probability p to experience a change-point, whose magnitude is chosen uniformly at random in [ min, max].
Dataset Splits	No	The paper primarily uses synthetic data generated according to specified parameters (e.g., number of arms, breakpoints, magnitudes) for different values of the horizon T. It does not involve traditional dataset splits like training, validation, or test sets.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments or simulations.
Software Dependencies	No	Experiments were performed with a library written in the Julia language which is available online.4 While the programming language 'Julia' is mentioned, no specific version number or other software dependencies with their versions are provided for reproducibility.
Experiment Setup	Yes	For all the algorithms, we performed the tuning recommended in the corresponding paper, using in particular the knowledge of the number of breakpoints ΥT and the horizon T when needed. Only two algorithms do not require the knowledge of ΥT : Ad Switch and GLR-kl UCB. We experiment with Exp3.S (with theoretically optimal tuning in Corollary 8.3 of Auer et al. (2002b)), and the two passively adaptive algorithms Discounted kl UCB (D-kl UCB) with discount factor γ = 1 p ΥT /T/4 and Sliding-Window kl UCB (SW-kl UCB) with window-size τ = 2 p T ln(T)/ΥT ... we set α = p ΥT A ln(T)/T for all algorithms using a constant exploration probability and αk = p k A ln(T)/T for the exploration sequence of GLR-kl UCB. Regarding the parameters of the change-point detectors, we use a threshold h = ln(T/ΥT ) for CUSUM-kl UCB, as recommended by Liu et al. (2018), and experience with different values of (M, ϵ) that have to be tuned using some prior knowledge of the problem. For M-kl UCB, we experiment with different values of the windows parameter w (often choosing the tuning w = 800 that was found to be robust in the experiments of Cao et al. (2019)) and use the recommended threshold w ln(2AT 2)/2. For the Bernoulli-GLR test, we use the threshold function β(n, δ) = ln(n3/2/δ) and set δ = 1/ T, which is the largest value licensed by Corollary 6. For GLR-kl UCB and Ad Switch, which are computationally more demanding due to the tests based on scan-statistics, we use some implementation tweaks. For GLR-kl UCB, we use some down-sampling as discussed in Section 3.3, performing the test only every t = 10 time steps and scanning every s = 5 observations for a possible change-point. To be able to implement Ad Switch up to a horizon T = 5000, we set t = 50. ... Finally, the parameter C1 that governs the elimination of good arms and should be chosen large enough was set to C1 = 1.