Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching

Authors: Zhong Li, Qi Huang, Yuxuan Zhu, Lincen Yang, Mohammad Mohammadi Amiri, Niki van Stein, Matthijs van Leeuwen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the ADBench benchmark show that TCCM strikes a favorable balance between detection accuracy and inference cost, outperforming state-of-the-art methods especially on high-dimensional and large-scale datasets. The source code is provided at https://github.com/Zhong LIFR/TCCM-NIPS.
Researcher Affiliation	Academia	Zhong Li , Qi Huang Yuxuan Zhu Lincen Yang (B) Mohammad Mohammadi Amiri Niki van Stein Matthijs van Leeuwen The Leiden Institute of Advanced Computer Science (LIACS), Leiden University Department of Computer Science, Rensselaer Polytechnic Institute The Intelligent Computing Research Center, Great Bay University Corresponding Author (B): EMAIL (Lincen Yang)
Pseudocode	Yes	The pseudocode for training is given in Algorithm 1 in Appendix B.5. The pseudocode for inference is given in Algorithm 2 in Appendix B.5.
Open Source Code	Yes	The source code is provided at https://github.com/Zhong LIFR/TCCM-NIPS. Code available at: https://github.com/Zhong LIFR/TCCM-NIPS
Open Datasets	Yes	We evaluate TCCM on 47 benchmark datasets from the ADBench suite (Han et al., 2022)... A summary of the datasets used in our study is provided in Table 1. We adopt 47 benchmark datasets from the well-established ADBENCH benchmark (Han et al., 2022)
Dataset Splits	Yes	We adopt a semi-supervised anomaly detection setting, where models are trained solely on normal instances. Specifically, we apply a stratified split to the normal data, using 50% for training and holding out the rest for testing. The test set includes both normal and anomalous samples.
Hardware Specification	Yes	All experiments are conducted on machines equipped with Intel Xeon Gold 6430 CPUs (3.4 GHz, same model across runs, though not necessarily the same physical unit) and 256 GB RAM. No GPU acceleration is used. To ensure a fair comparison, each model is restricted to run on a single CPU core, allocated up to 10 GB of RAM, and a maximum runtime of 3 days per dataset.
Software Dependencies	Yes	Our implementation is based on Python 3.9.21 with Py Torch 2.0, and experiments are executed within a conda-managed environment running Ubuntu 22.04.
Experiment Setup	Yes	The time-conditioned velocity field fθ(x, t) is parameterized by a 3-layer multilayer perceptron (MLP), where each hidden layer contains 256 units followed by Re LU activations. To incorporate time information, we use a fixed sinusoidal embedding of the scalar time input t [0, 1], following the positional encoding scheme used in transformer models (Vaswani et al., 2017). The time embedding (default dimension: 128) is concatenated with the input vector x, and the combined representation is passed through the MLP to produce the predicted flow vector. We use the Adam optimizer with a learning rate of 0.005. The batch size is set to 1024 for datasets with more than 10,000 samples and to min(512, #training instances) for smaller datasets. The number of training epochs is determined empirically using the unsupervised hyperparameter selection method proposed by Li et al. (2025b), which requires no access to anomaly labels. While their method supports per-seed tuning, for consistency and fair evaluation, we fix the number of epochs across different random seeds. Notably, thanks to the efficiency of TCCM, tuning this single hyperparameter incurs minimal computational overhead. This is also the only data-dependent hyperparameter in our setup. The choices of key hyperparameters for our TCCM are presented in Table 5.