Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Online Change Detection via Random Fourier Features

Authors: Florian Kalinke, Shakeel Gavioli-Akilagun

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We collect our experiments on synthetic data in Section 6.1 and on the MNIST data set in Section 6.2. We refer to Appendix A.3 for additional experiments and a numerical comparison of different thresholds for the stopping rule. To interpret the change point detection performance of the proposed method, we compare its average run length (ARL) and expected detection delay (EDD) to the existing kernel-based methods presented in Table 1.
Researcher Affiliation	Academia	Florian Kalinke Information Systems Karlsruhe Institute of Technology (KIT) Karlsruhe, Germany EMAIL Shakeel Gavioli-Akilagun* Department of Decision Analytics and Operations City University Hong Kong Hong Kong, China EMAIL
Pseudocode	Yes	We show the pseudo code of our proposed method in Algorithm 1; see also Example 1 and Figure 1 for a summary.
Open Source Code	Yes	All code replicating our experiments is available in the supplement and at https://github.com/FlopsKa/rff-change-detection.
Open Datasets	Yes	Empirical validation: We perform a suite of benchmarks on synthetic data, the MNIST data, and the HASC data to demonstrate the applicability of the proposed method. (...) This section collects our experiments on the Human Activity Sensing Consortium (HASC; available at http://hasc.jp/hc2011/) challenge 2011 data set, which is also considered in Liu et al. [35], Li et al. [33], Wei and Xie [59]. (...) we additionally run the proposed method on the M17-4 sample (illustrated in Figure 6) of pianist pid50534-05 of their Mazurka BL [27] data set with the goal of detecting these changes.
Dataset Splits	Yes	For approximating the EDD of each algorithm, we draw 64 samples from P, respectively, before sampling from Q; we report the average over 100 repetitions. (...) For preprocessing, we order the corresponding csv-files in the data set lexicographically, omitting the first 1 596 (detailed below) samples of walking, and then concatenating 100 walking observations and 100 staying observations to obtain a total of 10 data sets (with a single change point each). (...) To obtain an EDD estimate, we sample and process 512 observations from MNIST digit 0 (pre-change) and 1 024 samples from digits 1–9 (post-change), respectively, averaging the detection delay over 100 repetitions.
Hardware Specification	Yes	All results were obtained on a PC with Ubuntu 20.04 LTS, 124GB RAM, and 32 cores with 2GHz each.
Software Dependencies	No	All results were obtained on a PC with Ubuntu 20.04 LTS, 124GB RAM, and 32 cores with 2GHz each. (...) for the density ratio-based (i.e., non-kernel-based) Ru LSIF algorithm which showed the best performance on HASC in Liu et al. [35] , we use the python changepoynt implementation and consider the l2-norm of each three-dimensional observation. The changepoynt library is available at https://github.com/Lucew/changepoynt.
Experiment Setup	Yes	Matching the settings of the reproduced experiment, we choose Bmax = 50 and N = 15 for online kernel CUSUM; for Scan B-statistics and New MA, we set B0 = 50. The remaining parameters of New MA then follow from the heuristics detailed by the authors [24]. For Online RFF-MMD, we set r = 1 000. We compute the thresholds for a given target ARL by processing 10 (target ARL) samples with each algorithm, repeating for 100 Monte Carlo (MC) iterations, and computing the 1 - 1/(target ARL) quantile of the resulting test statistics. (...) All kernel-based approaches use the (approximated) Gaussian kernel with the γ > 0 parameter set by the median heuristic [12] or its RFF approximation, depending on the algorithm.