Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generative Semi-supervised Graph Anomaly Detection
Authors: Hezhe Qiao, Qingsong Wen, Xiaoli Li, Ee-peng Lim, Guansong Pang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on six real-world GAD datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes. |
| Researcher Affiliation | Collaboration | Hezhe Qiao1, Qingsong Wen2, Xiaoli Li3,4, Ee-Peng Lim1, Guansong Pang1 1School of Computing and Information Systems, Singapore Management University 2Squirrel AI 3 Institute for Infocomm Research, A*STAR, Singapore 4 A*STAR Centre for Frontier AI Research, Singapore |
| Pseudocode | Yes | The training algorithms of GGAD are summarized in Algorithm 1 and Algorithm 2. Algorithm 1 describes the full training process of GGAD. Algorithm 2 describes the mini-batch processing for handling very large graph datasets, i.e., DGraph. |
| Open Source Code | Yes | Code is available at https://github.com/mala-lab/GGAD. |
| Open Datasets | Yes | We conduct experiments on six large real-world graph datasets with genuine anomalies from diverse domains, including the co-review network in Amazon [10], transaction record network in T-Finance [50], social networks in Reddit [21], bitcoin transaction in Elliptic [55], co-purchase network in Photo [35] and financial network in DGraph [18]. |
| Dataset Splits | No | To simulate practical scenarios where we need to annotate only a relatively small number of normal nodes, we randomly sample R% of the normal nodes as labeled normal data for training, in which R is chosen in {10, 15, 20, 25}, with the rest of nodes is treated as the testing set. |
| Hardware Specification | Yes | GGAD is implemented in Pytorch 1.6.0 with Python 3.7. and all the experiments are run on a 24-core CPU. |
| Software Dependencies | Yes | GGAD is implemented in Pytorch 1.6.0 with Python 3.7. |
| Experiment Setup | Yes | In GGAD, its weight parameters are optimized using Adam [20] optimizer with a learning rate of 1e 3 by default. For each dataset, the hyperparameters β and λ for two constraints are uniformly set to 1, though GGAD can perform stably with a range of β and λ (see App. C.2). The size of the generated outlier nodes S is set to 5% of |Vl| by default and stated otherwise. The affinity margin α is set to 0.7 across all datasets. The perturbation in Eq. (5) is drawn from a Gaussian distribution, with mean and standard variance set to 0.02 and 0.01 respectively, and it is stated otherwise. |