Generative Semi-supervised Graph Anomaly Detection
Authors: Hezhe Qiao, Qingsong Wen, Xiaoli Li, Ee-peng Lim, Guansong Pang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on six real-world GAD datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes. |
| Researcher Affiliation | Collaboration | Hezhe Qiao1, Qingsong Wen2, Xiaoli Li3,4, Ee-Peng Lim1, Guansong Pang1 1School of Computing and Information Systems, Singapore Management University 2Squirrel AI 3 Institute for Infocomm Research, A*STAR, Singapore 4 A*STAR Centre for Frontier AI Research, Singapore |
| Pseudocode | Yes | The training algorithms of GGAD are summarized in Algorithm 1 and Algorithm 2. Algorithm 1 describes the full training process of GGAD. Algorithm 2 describes the mini-batch processing for handling very large graph datasets, i.e., DGraph. |
| Open Source Code | Yes | Code is available at https://github.com/mala-lab/GGAD. |
| Open Datasets | Yes | We conduct experiments on six large real-world graph datasets with genuine anomalies from diverse domains, including the co-review network in Amazon [10], transaction record network in T-Finance [50], social networks in Reddit [21], bitcoin transaction in Elliptic [55], co-purchase network in Photo [35] and financial network in DGraph [18]. |
| Dataset Splits | No | To simulate practical scenarios where we need to annotate only a relatively small number of normal nodes, we randomly sample R% of the normal nodes as labeled normal data for training, in which R is chosen in {10, 15, 20, 25}, with the rest of nodes is treated as the testing set. |
| Hardware Specification | Yes | GGAD is implemented in Pytorch 1.6.0 with Python 3.7. and all the experiments are run on a 24-core CPU. |
| Software Dependencies | Yes | GGAD is implemented in Pytorch 1.6.0 with Python 3.7. |
| Experiment Setup | Yes | In GGAD, its weight parameters are optimized using Adam [20] optimizer with a learning rate of 1e 3 by default. For each dataset, the hyperparameters β and λ for two constraints are uniformly set to 1, though GGAD can perform stably with a range of β and λ (see App. C.2). The size of the generated outlier nodes S is set to 5% of |Vl| by default and stated otherwise. The affinity margin α is set to 0.7 across all datasets. The perturbation in Eq. (5) is drawn from a Gaussian distribution, with mean and standard variance set to 0.02 and 0.01 respectively, and it is stated otherwise. |