reproducibilityindex.ai

Generative Semi-supervised Graph Anomaly Detection

Authors: Hezhe Qiao, Qingsong Wen, Xiaoli Li, Ee-peng Lim, Guansong Pang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on six real-world GAD datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes.
Researcher Affiliation	Collaboration	Hezhe Qiao1, Qingsong Wen2, Xiaoli Li3,4, Ee-Peng Lim1, Guansong Pang1 1School of Computing and Information Systems, Singapore Management University 2Squirrel AI 3 Institute for Infocomm Research, ASTAR, Singapore 4 ASTAR Centre for Frontier AI Research, Singapore
Pseudocode	Yes	The training algorithms of GGAD are summarized in Algorithm 1 and Algorithm 2. Algorithm 1 describes the full training process of GGAD. Algorithm 2 describes the mini-batch processing for handling very large graph datasets, i.e., DGraph.
Open Source Code	Yes	Code is available at https://github.com/mala-lab/GGAD.
Open Datasets	Yes	We conduct experiments on six large real-world graph datasets with genuine anomalies from diverse domains, including the co-review network in Amazon [10], transaction record network in T-Finance [50], social networks in Reddit [21], bitcoin transaction in Elliptic [55], co-purchase network in Photo [35] and financial network in DGraph [18].
Dataset Splits	No	To simulate practical scenarios where we need to annotate only a relatively small number of normal nodes, we randomly sample R% of the normal nodes as labeled normal data for training, in which R is chosen in {10, 15, 20, 25}, with the rest of nodes is treated as the testing set.
Hardware Specification	Yes	GGAD is implemented in Pytorch 1.6.0 with Python 3.7. and all the experiments are run on a 24-core CPU.
Software Dependencies	Yes	GGAD is implemented in Pytorch 1.6.0 with Python 3.7.
Experiment Setup	Yes	In GGAD, its weight parameters are optimized using Adam [20] optimizer with a learning rate of 1e 3 by default. For each dataset, the hyperparameters β and λ for two constraints are uniformly set to 1, though GGAD can perform stably with a range of β and λ (see App. C.2). The size of the generated outlier nodes S is set to 5% of \|Vl\| by default and stated otherwise. The affinity margin α is set to 0.7 across all datasets. The perturbation in Eq. (5) is drawn from a Gaussian distribution, with mean and standard variance set to 0.02 and 0.01 respectively, and it is stated otherwise.