Rethinking Graph Neural Networks for Anomaly Detection

Authors: Jianheng Tang, Jiajin Li, Ziqi Gao, Jia Li

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of BWGNN on four large-scale anomaly detection datasets. We conduct extensive experiments on four datasets in both supervised and semi-supervised settings.
Researcher Affiliation Academia 1Hong Kong University of Science and Technology (Guangzhou) 2Hong Kong University of Science and Technology 3Stanford University.
Pseudocode No The paper describes the propagation process but does not include a formal pseudocode block or algorithm listing.
Open Source Code Yes Our code and data are released at https://github.com/squareRoot3/Rethinking-Anomaly-Detection.
Open Datasets Yes We choose two widely used datasets in previous works (Liu et al., 2021b; 2020), including the Amazon dataset (Mc Auley & Leskovec, 2013) for user anomaly detection and the Yelp Chi dataset (Rayana & Akoglu, 2015) for review anomaly detection. We further construct two large-scale real-world datasets as new graph anomaly detection benchmarks, including the T-Finance dataset based on a transaction network and the T-Social dataset based on a social network.
Dataset Splits Yes The training ratio is 40% in the supervised scenario and 1% in the semi-supervised scenario, while the remaining data are split by 1:2 for validation and test. On the T-Social dataset, h is set to 64, C is set to 5, the supervised training ratio is 40%, and the semi-supervised training ratio is 0.01% (with only 17 labeled anomalies). The ratio of validation and test sets is 1:2.
Hardware Specification Yes We conduct all the experiments on a high performance computing server running Ubuntu 20.04 with a Intel(R) Xeon(R) Gold 6226R CPU and 64GB memory.
Software Dependencies No MLP is implemented by PyTorch (Paszke et al., 2019), and SVM is in Scikit-learn (Pedregosa et al., 2011). For GCN, Cheby Net, GAT, and Graph SAGE, we use the implementation of DGL. Our model is implemented based on PyTorch and DGL (Wang et al., 2019b).
Experiment Setup Yes We train all models except SVM for 100 epochs by Adam optimizer with a learning rate of 0.01, and save the model with the best Macro-F1 in validation. The training ratio is 40% in the supervised scenario and 1% in the semi-supervised scenario, while the remaining data are split by 1:2 for validation and test. On the Yelp Chi, Amazon, and T-Finance datasets, the dimension h for representations and hidden states in all models are set to 64, and the order C in BWGNN is 2. We use concatenation as the AGG( ) function in BWGNN. On the T-Social dataset, h is set to 64, C is set to 5, the supervised training ratio is 40%, and the semi-supervised training ratio is 0.01% (with only 17 labeled anomalies).