Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Random Forests for Change Point Detection

Authors: Malte Londschien, Peter Bühlmann, Solt Kovács

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed method changeforest achieves improved empirical performance in an extensive simulation study compared to existing multivariate nonparametric change point detection methods.
Researcher Affiliation Academia Malte Londschien EMAIL Peter Bühlmann EMAIL Solt Kovács EMAIL Seminar for Statistics ETH Zürich 8092 Zürich, Switzerland
Pseudocode Yes Algorithm 1 changeforest; Algorithm 2 Two Step Search; Algorithm 3 Model Selection
Open Source Code Yes An efficient implementation of our method is made available for R, Python, and Rust users in the changeforest software package. ... For inquiries, installation instructions, a tutorial, and more information, please visit github.com/mlondschien/changeforest.
Open Datasets Yes We use the following data sets: The iris flower data set (Anderson, 1936)... The glass identification data set (Evett and Spiehler, 1989)... The wine data set (Cortez et al., 2009)... The Wisconsin breast cancer data set (Street et al., 1993)... The abalone data set (Waugh, 1995)... The dry beans data set (Koklu and Ozkan, 2020)...
Dataset Splits Yes For the change in mean and change in covariance setups, we generate time series of dimension d = 5 with n = 600 observations and change points at t = 200, 400. ... We simulate 500 data sets with K = 20, 80 and vary n = 250, 354, 500, 707, 1000, . . . , 64000 for each setup. ... We use round(n Nk) as segment lengths, where we round such that PK k=1 round(n Nk) = n.
Hardware Specification Yes Simulations were run on eight Intel Xeon 2.3 GHz cores with 4 GB of RAM available per core (32 GB in total).
Software Dependencies No The changeforest package is available for Python users on Py PI and conda-forge (conda-forge community, 2015), R users on conda-forge, and Rust users on crates.io. Its backend is implemented in the system programming language Rust (Matsakis and Klock, 2014). ... An implementation of ECP is available through the R-package ecp (James and Matteson, 2015)... An efficient implementation can be found in the Python package ruptures (Truong et al., 2020). Specific version numbers for Python, R, Rust, or any of the mentioned packages (ecp, ruptures) are not provided.
Experiment Setup Yes With η = exp( 6), we effectively cap individual classifier log-likelihood ratios from below by 6. ... The resulting two-step algorithm, as implemented in changeforest, using three initial guesses at the segment s 25%, 50%, and 75%-quantiles is presented in Algorithm 2. ... For ECP, this results in α = 1 and using 199 permutations at a significance level of 0.05. For KCP, we use a Gaussian kernel with a bandwidth of 0.1 after normalization by the median absolute deviation of absolute consecutive differences, see Section 4.2. This choice of bandwidth was optimal for the simulated scenarios, see also Section 4.6 and Table 9. For MNWBS, we used the bandwidth 5 (n log(n)/δ)1/p as proposed by Madrid Padilla et al. (2021b). We used 50 random intervals to reduce the computational cost.