Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Conditional Diffusion Anomaly Modeling on Graphs

Authors: Chunyu Wei, Haozhe Lin, Yueguo Chen, Yunhai Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments 5.1 Experimental Setup Datasets We have extensively employed five diverse datasets from various domains to verify our method. They are the e-finance category dataset Elliptic [Weber et al., 2019], crowd-sourcing category datasets Tolokers [Platonov et al., 2023] and Yelp Chi [Rayana and Akoglu, 2015], and Social media datasets Question [Platonov et al., 2023] and Reddit [Kumar et al., 2019]. For the detail of dataset statistics and processing, please refer to Appendix G. Baselines We have compared our CGADM with two categories of methods in the context of graph anomaly detection: (1) Standard GNNs, which include GCN [Kipf and Welling, 2017], GIN [Xu et al., 2019], Graph SAGE [Hamilton et al., 2017], and GAT [Velickovic et al., 2018]; (2) GNNs specifically designed for anomaly detection, such as GAS [Li et al., 2019], PCGNN [Liu et al., 2021b], BWGNN [Tang et al., 2022], GHRN [Gao et al., 2023b], XGBGraph [Tang et al., 2023], and CONSISGAD [Chen et al., 2024]; (3) diffusion-based data-centric approaches for GAD: GODM [Ma et al., 2024a], CGen GA [Liu et al., 2023]. For detailed descriptions, please refer to Appendix E. Metrics Following the evaluation setup employed by most anomaly detection works [Han et al., 2022a], we have chosen the Area Under the Receiver Operating Characteristic Curve (AUROC) and the Area Under the Precision-Recall Curve (AUPRC) as our metrics for graph anomaly detection. Both of these metrics range between 0 and 1, and we record them as percentages for convenience. For both metrics, a higher value indicates better performance.
Researcher Affiliation	Academia	Chunyu Wei1, Haozhe Lin2, Yueguo Chen1 , Yunhai Wang1 1Renmin University of China, China 2Tsinghua University, China EMAIL EMAIL EMAIL EMAIL Corresponding author. He works at Big Data and Responsible Artificial Intelligence for National Governance, Renmin University of China
Pseudocode	Yes	Algorithm 1 Inference for Anomaly Detection 1: Initialize y T N(gϕ(E, X), I) 2: for t = T to 1 do 3: Calculate reparameterized ˆy0 according to Equation 10: ˆy0 = 1 αt (yt (1 αt)gϕ(E, X) 1 αtϵθ(yt, t, E, X)) (13) 4: if t > 1 then 5: Draw z N(0, I) 6: yt 1 = γ0ˆy0 + γ1yt + γ2gϕ(E, X) + βtz, according to Equation 6. 7: else 8: Set yt 1 = ˆy0 9: end if 10: end for 11: return y0
Open Source Code	Yes	2The code is available on https://github.com/weicy15/CGADM.
Open Datasets	Yes	Datasets We have extensively employed five diverse datasets from various domains to verify our method. They are the e-finance category dataset Elliptic [Weber et al., 2019], crowd-sourcing category datasets Tolokers [Platonov et al., 2023] and Yelp Chi [Rayana and Akoglu, 2015], and Social media datasets Question [Platonov et al., 2023] and Reddit [Kumar et al., 2019]. For the detail of dataset statistics and processing, please refer to Appendix G.
Dataset Splits	Yes	For each dataset, we randomly selected 20% of the points as training data, 10% of the points as validation data, and the remaining points as test data.
Hardware Specification	Yes	All experiments were conducted on a Linux machine equipped with an Nvidia Ge Force RTX 3090.
Software Dependencies	No	The CUDA version used was 11.1, and the driver version was 455.45.01. We implemented our algorithm and the corresponding baseline methods using Py Torch [Paszke et al., 2019] and the graph computation framework Pytorch-Geometric [Fey and Lenssen, 2019]. For the Random Forest (RF) and Extreme Gradient Boosting Tree (XGBT) that serve as conditional anomaly estimators, we used the RF version implemented in the Scikit-Learn library Pedregosa et al. [2011]. For XGBoost Chen and Guestrin [2016], we utilized its official implementation.
Experiment Setup	Yes	Implementation Details For CGADM, the layer number of graph convolution is set to three, a value considered reasonable by most works [Liu et al., 2021b]. For our diffusion process, the noise levels at the initial and final time steps, β1 and βT , are set to 1e-4 and 0.02, respectively. Additionally, we employ linear interpolation to divide the time steps between them, which is consistent with DDPM [Ho et al., 2020]. For other implementation details, please refer to Appendix K. Appendix K: We initialize the latent vectors for all models with a Gaussian Distribution, having a mean value of 0 and a standard deviation of 0.01. To ensure a level playing field, the dimension of the hidden layer for all baseline models, as well as our CGADM, is set to 64. We conducted a grid search for hyper-parameter tuning. The learning rates were selected from the set [0.005, 0.01, 0.02, 0.05]. To prevent overfitting, we incorporated an L2 norm with the coefficient tuned from the set [0.001, 0.005, 0.01, 0.02, 0.1]. For all methods, we selected the best models by implementing early stopping when the AUROC on the validation set did not increase for five consecutive epochs.