DGA-GNN: Dynamic Grouping Aggregation GNN for Fraud Detection

Authors: Mingjiang Duan, Tongya Zheng, Yang Gao, Gang Wang, Zunlei Feng, Xinyu Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five datasets suggest that our proposed method achieves a 3% 16% improvement over existing SOTA methods. Code is available at https://github.com/Atwood Duan/DGA-GNN. Experiments are conducted on five real-world fraud detection datasets. Ablation Study and Parameter Analysis.
Researcher Affiliation Collaboration 1Zhejiang University 2Hangzhou City University 3Bangsheng Technology Co,Ltd. 4ZJU-Bangsun Joint Research Center 5Shanghai Institute for Advanced Study of Zhejiang University
Pseudocode Yes Algorithm 1 describes the pseudocode of decision tree binning encoding. Algorithm 2 shows the pseudocode of the training procedure.
Open Source Code Yes Code is available at https://github.com/Atwood Duan/DGA-GNN.
Open Datasets Yes Experiments are conducted on five real-world fraud detection datasets. These datasets comprise Elliptic, designed for illicit Bitcoin transaction detection (Weber et al. 2019); TFinance, a financial transaction fraud dataset (Tang et al. 2022); T-Social, a social network abnormal account detection dataset (Tang et al. 2022). Additionally, Yelp Chi and Amazon are included, both widely utilized as fake review datasets in graph fraud detection literature (Rayana and Akoglu 2015; Mc Auley and Leskovec 2013).
Dataset Splits Yes For all datasets excluding Elliptic, the proportions for training, validation, and testing are distributed in a 4:2:4 ratio. The partitioning is performed with utilities from the sklearn package, and we maintain a consistent random seed as per prior work. In the case of the Elliptic dataset, the partitioning respects transaction entity timestamps, conforming to official recommendations for dataset division.
Hardware Specification Yes All experiments are run on a NVIDIA A100 GPU and an Intel i913900K processor @5.80 GHz.
Software Dependencies No The paper mentions using
Experiment Setup Yes For all methods involving neural networks, we employ the Adam optimizer with a learning rate of 0.001 and a weight decay of 0.001. The maximum number of iterations is set to 1000. The model achieving the lowest validation loss is saved and subsequently utilized for test set predictions. All baseline models are fine-tuned postinitialization using the officially recommended parameters. For the DGA-GNN, the number of bins k and the decision threshold z are determined based on the validation set score.