Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Association-Focused Path Aggregation for Graph Fraud Detection

Authors: Tian Qiu, Wenda Li, Zunlei Feng, Jie Lei, Tao Wang, Yi Gao, Mingli Song, Yang Gao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across datasets in multiple fraud scenarios demonstrate that the proposed GPA outperforms mainstream fraud detectors by up to +15% in Average Precision (AP). Additionally, GPA exhibits enhanced robustness to noisy labels and provides excellent interpretability by uncovering implicit fraudulent patterns across broader contexts.
Researcher Affiliation	Collaboration	1 State Key Laboratory of Blockchain and Data Security, Zhejiang University 2 Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 3 College of Computer Science, Zhejiang University of Technology 4 Department of Planning, Ministry of Emergency Management Big Data Center
Pseudocode	No	The paper describes the methodology using text and mathematical equations, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code is available at https://github.com/horrible-dong/GPA.
Open Datasets	Yes	To further facilitate this line of research, we focus on the currently prevalent yet underexplored field of internet fraud detection. Based on established fraud rules, we synthesize the first internet fraud dataset, G-Internet, to support research on interpretable association analysis. Extensive experiments across datasets in multiple fraud scenarios encompassing the internet, finance, social networks, and online reviews demonstrate that the proposed GPA outperforms various mainstream fraud detectors in Area Under the Curve (AUC) and Average Precision (AP). Meanwhile, GPA exhibits stronger robustness to noisy labels. It also provides excellent interpretability that can uncover common patterns within fraud-related paths through global pattern interaction and similarity computation, showcasing a more comprehensive view of the associations among diverse fraudulent entities.
Dataset Splits	Yes	For datasets except Elliptic, the split ratio for training, validation, and testing is 4:2:4. Detailed dataset descriptions are provided in Appendix A.2. Elliptic [40]... The dataset is split into training, validation, and testing sets with proportions of 45.86%, 18.34%, and 35.80%, respectively, based on transaction timestamps as per official recommendations. G-Internet has been thoroughly detailed in A.1. The dataset is partitioned into training, validation, and testing sets with proportions of 40.00%, 20.00%, and 40.00%, respectively. T-Finance [13]... The dataset is partitioned into training, validation, and testing sets with proportions of 40.00%, 20.00%, and 40.00%, respectively. T-Social [13]... The dataset is partitioned into training, validation, and testing sets with proportions of 40.00%, 20.00%, and 40.00%, respectively. Yelp Chi [41]... The dataset is partitioned into training, validation, and testing sets with proportions of 40.00%, 20.00%, and 40.00%, respectively. Amazon [42]... The dataset is partitioned into training, validation, and testing sets with proportions of 40.00%, 20.00%, and 40.00%, respectively.
Hardware Specification	Yes	All experiments use a batch size of 1024 and are conducted on a single NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies	No	The paper mentions using the Adam W optimizer and PyTorch (implied by typical GNN frameworks), but does not specify exact version numbers for these or other software libraries/dependencies.
Experiment Setup	Yes	The experiments for the proposed GPA are conducted using the Adam W optimizer with an initial learning rate of 1e-4 or 1e-3 and a weight decay of 5e-4. It adopts minibatch sampling of user nodes per iteration, training for a maximum of 200 epochs from scratch. The baseline models use the officially recommended settings. All numerical results are the averages across 10 different random seeds. Detailed GPA model settings can be found in Appendix A.3. All experiments use a batch size of 1024 and are conducted on a single NVIDIA Ge Force RTX 3090 GPU.