reproducibilityindex.ai

Rethinking Graph Neural Networks for Anomaly Detection

Authors: Jianheng Tang, Jiajin Li, Ziqi Gao, Jia Li

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of BWGNN on four large-scale anomaly detection datasets. We conduct extensive experiments on four datasets in both supervised and semi-supervised settings.
Researcher Affiliation	Academia	1Hong Kong University of Science and Technology (Guangzhou) 2Hong Kong University of Science and Technology 3Stanford University.
Pseudocode	No	The paper describes the propagation process but does not include a formal pseudocode block or algorithm listing.
Open Source Code	Yes	Our code and data are released at https://github.com/squareRoot3/Rethinking-Anomaly-Detection.
Open Datasets	Yes	We choose two widely used datasets in previous works (Liu et al., 2021b; 2020), including the Amazon dataset (Mc Auley & Leskovec, 2013) for user anomaly detection and the Yelp Chi dataset (Rayana & Akoglu, 2015) for review anomaly detection. We further construct two large-scale real-world datasets as new graph anomaly detection benchmarks, including the T-Finance dataset based on a transaction network and the T-Social dataset based on a social network.
Dataset Splits	Yes	The training ratio is 40% in the supervised scenario and 1% in the semi-supervised scenario, while the remaining data are split by 1:2 for validation and test. On the T-Social dataset, h is set to 64, C is set to 5, the supervised training ratio is 40%, and the semi-supervised training ratio is 0.01% (with only 17 labeled anomalies). The ratio of validation and test sets is 1:2.
Hardware Specification	Yes	We conduct all the experiments on a high performance computing server running Ubuntu 20.04 with a Intel(R) Xeon(R) Gold 6226R CPU and 64GB memory.
Software Dependencies	No	MLP is implemented by PyTorch (Paszke et al., 2019), and SVM is in Scikit-learn (Pedregosa et al., 2011). For GCN, Cheby Net, GAT, and Graph SAGE, we use the implementation of DGL. Our model is implemented based on PyTorch and DGL (Wang et al., 2019b).
Experiment Setup	Yes	We train all models except SVM for 100 epochs by Adam optimizer with a learning rate of 0.01, and save the model with the best Macro-F1 in validation. The training ratio is 40% in the supervised scenario and 1% in the semi-supervised scenario, while the remaining data are split by 1:2 for validation and test. On the Yelp Chi, Amazon, and T-Finance datasets, the dimension h for representations and hidden states in all models are set to 64, and the order C in BWGNN is 2. We use concatenation as the AGG( ) function in BWGNN. On the T-Social dataset, h is set to 64, C is set to 5, the supervised training ratio is 40%, and the semi-supervised training ratio is 0.01% (with only 17 labeled anomalies).