Anomaly Subgraph Detection through High-Order Sampling Contrastive Learning

Authors: Ying Sun, Wenjun Wang, Nannan Wu, Chunlong Bao

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate ASD-HC against five state-of-the-art baselines using five benchmark datasets. ASD-HC outperforms the baselines by over 13.01% in AUC score. Various experiments demonstrate that our approach effectively detects anomaly subgraphs within large-scale graphs.
Researcher Affiliation Academia Ying Sun1,2 , Wenjun Wang1,3 , Nannan Wu1 and Chunlong Bao2 1College of Intelligence and Computing, Tianjin University, China 2School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, China 3Yazhou Bay Innovation Institute, Hainan Tropical Ocean University, China {yingsun, wjwang, nannan.wu}@tju.edu.cn, baochunlong0@gmail.com
Pseudocode Yes Algorithm 1 Anomaly Subgraph Detection With High-Order Neighborhood Sampling Contrastive Learning (ASD-HC)
Open Source Code No The paper does not contain an explicit statement about releasing open-source code or provide a link to a code repository for the methodology described.
Open Datasets Yes Our paper leverages five widely recognized datasets to assess the performance of our algorithm and baselines. These datasets are commonly employed in other deep learningbased approaches[Zheng et al., 2023; Liu et al., 2022; Jin et al., 2021], and their key information is outlined in Table 1. These datasets are categorized into two types: Citation Networks: Cora1[Mc Callum et al., 2000], Cite Seer2[Lawrence et al., 1999], ACM[Sen et al., 2008][Tang et al., 2008]. Social Networks: Blog Catalog3 and Flickr[Tang and Liu, 2009]. Footnotes provide links: 1www.cora.justresearch.com 2www.scienceindex.com 3http://www.blogcatalog.com
Dataset Splits No The paper describes the datasets and anomaly injection, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or exact counts of samples for each split) that are necessary for reproduction.
Hardware Specification Yes We explore all experiments on the GPU: Ge Force GTX 1080 Ti (11GB) 2, with a total memory of 128GB.
Software Dependencies No The paper mentions using models like GCN, GAT, and Graph SAGE, but it does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that were used for implementation.
Experiment Setup Yes In our experiment, the batch size is set to 200, and the number of epochs is set to 100 for smaller graphs (e.g., Cora and Citeseer). For larger graphs with a substantial number of nodes or edges (e.g., ACM, Flickr, and Blog Catalog) reflecting more complex graph structures, additional training time is required. In such cases, we set the number of epochs to 400. The learning rate is set in the range of [0.001, 0.0035]. In addition to these above common parameters, our algorithm introduces three other crucial parameters: the order of neighbors (denoted as k) that we consider, the size of the neighbor-subgraph (denoted as t), and the significant level of the anomaly α. Typically, we set k equal to 3, t to 15, and assign a value to α within the range of [0.15, 0.25].