Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Normal-Abnormal Guided Generalist Anomaly Detection

Authors: Yuexin Wang, Xiaolei Wang, Yizheng Gong, Jimin XIAO

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing GAD approaches.
Researcher Affiliation	Academia	Yuexin Wang1,2 , Xiaolei Wang1,2 , Yizheng Gong1,2, Jimin Xiao1 1Xi an Jiaotong-Liverpool University 2University of Liverpool
Pseudocode	No	The paper describes the methodology using textual explanations, mathematical equations, and block diagrams (Figure 2), but does not include a dedicated pseudocode block or algorithm listing.
Open Source Code	Yes	The code and datasets are available at https://github.com/JasonKyng/NAGL.
Open Datasets	Yes	To validate the efficiency of our NAGL framework, we construct three benchmarks using the MVTec AD [2], Vis A [69], and Bra TS [45] datasets.
Dataset Splits	Yes	During training phase, we organize the data into many episodes, where each episode consists of a reference set R = {Rn, Ra} and a query input xq from Dorigin. The reference set R contains normal samples Rn = {rn k}K1 k=1 and abnormal samples Ra = {ra k}K2 k=1, K1 and K2 denote the number of normal and abnormal reference samples, respectively. ... The training process converges within 20 epochs, with each epoch comprising 500 sampled episodes. ... Following previous works [23, 31], we set the number of normal references as K1 [1, 2, 4]. Considering the scarcity of abnormal samples, we only use one abnormal reference (K2 = 1), making our approach highly applicable in real-world scenarios.
Hardware Specification	Yes	The implementation is based on Py Torch@2.1.1, and the experiments are conducted on a single NVIDIA RTX 4090 24GB GPU.
Software Dependencies	Yes	The implementation is based on Py Torch@2.1.1, and the experiments are conducted on a single NVIDIA RTX 4090 24GB GPU.
Experiment Setup	Yes	The model is optimized using Adam W [39] with an initial learning rate of 1 10 5, which is reduced by a factor of 0.1 at epoch 10 and 15. The training process converges within 20 epochs, with each epoch comprising 500 sampled episodes. Input images are resized to 448 448 resolution without data augmentation. We set the number of learnable proxies (P) to M = 25 by default and use a loss balance weight (λ) of 1.0.