Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Anomaly Detection by an Ensemble of Random Pairs of Hyperspheres

Authors: Walid Durani, Collin Leiber, Khalid Durani, Claudia Plant, Christian Böhm

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on diverse real-world datasets show that ADERH consistently outperforms state-of-the-art methods while maintaining linear runtime scalability and stable performance across varying hyperparameter settings. 4 Experiments
Researcher Affiliation	Academia	1LMU Munich, Munich Center for Machine Learning (MCML), Munich, Germany 2Aalto University, Espoo, Finland 3University of Helsinki, Helsinki, Finland 4University of Innsbruck, Innsbruck, Austria 5Faculty of Computer Science, 6ds:Uni Vie, University of Vienna, Vienna, Austria EMAIL, EMAIL, EMAIL EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: ADERH
Open Source Code	Yes	Code is available at https://github.com/Walid10010/ADERH.git.
Open Datasets	Yes	Real-world datasets were sourced from the Ad Benchmark repository Han et al. [2022], with MNIST-Variation and AD-Variation datasets derived using Res Net18 features pre-trained on Image Net Han et al. [2022]. Table 6 summarizes dataset statistics.
Dataset Splits	Yes	We perform a stratified 70%/30% train test split that preserves the anomaly ratio, and normalize all features to the [0, 1] range using a Min Max Scaler [Pedregosa et al., 2011]. Experiments are repeated on three stratified splits.
Hardware Specification	Yes	Experiments were conducted on an Intel Core i7-10700K, 3.8 GHz, 32 GB RAM, with runtime averaged over ten consecutive runs.
Software Dependencies	No	Implementations were obtained from Zhao et al. [2019b], Xu et al. [2023a]. All methods (including ours) are implemented in Python, and we use the public repositories Zhao et al. [2019b], Xu et al. [2023a] for baseline implementations. While Python is mentioned, no specific version of Python or any other libraries are provided.
Experiment Setup	Yes	In ADERH, we fix two principal parameters: (i) n, the number of random subsets (the ensemble size), and (ii) ω, the size of each random subset. ... We adopt n = 256 in a similar spirit: ... and we set ω = 18. For competitors, we used the default parameter settings as specified in the respective papers (Table 7).