Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Collaboration Based Multi-Label Propagation for Fraud Detection
Authors: Haobo Wang, Zhao Li, Jiaming Huang, Pengrui Hui, Weiwei Liu, Tianlei Hu, Gang Chen
IJCAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that the proposed method not only outperforms on ordinary multi-label datasets, but is effective and scalable on large-scale e-commerce dataset. Section 4: Experiments. Table 1: Transductive performance comparison on ordinary multi-label datasets. Table 2: Transductive performance comparison of three graph-based algorithms on Taobao-FUD dataset. |
| Researcher Affiliation | Collaboration | 1Key Lab of Intelligent Computing Based Big Data of Zhejiang Province, Zhejiang University 2Alibaba Group, Hangzhou, China 3School of Computer Science, Wuhan University |
| Pseudocode | No | The paper describes the algorithms using mathematical equations and descriptive text but does not include a formal pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not provide any links to source code or explicitly state that the code for the described methodology is publicly available. |
| Open Datasets | Yes | We choose four real-world multi-label datasets from different task domains: 1) Medical [Pestian et al., 2007]: a text dataset... 2) Image [Wang et al., 2019]: a collection of... 3) Slashdot [Read et al., 2009]: a web text dataset... 4) Eurlex-sm [Loza Menc ia and F urnkranz, 2008]: a large text dataset... |
| Dataset Splits | Yes | All the datasets are randomly partitioned to 5% labeled data and 95% unlabeled data. We randomly select 5% examples as labeled data and the rest are used to evaluate the transductive performance. |
| Hardware Specification | No | The computations are performed on Max Compute platform, a fast, distributed and fully hosted GB/TB/PB level data warehouse solution. We use three computation instances for time comparison and 3000 instances for performance comparison. This mentions a platform and generic instances but no specific hardware components like CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions applying a three-layer neural network and building a k-NN adjacency graph, but it does not specify any software names with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | For our methods, γ is selected from {0.1, 1, 10, 100}. α is chosen from {0.01, 0.05, 0.1, 0.2, 0.5}. β, λ and µ are empirically fixed to 0.1. For Deep Fraud, we apply a three layer neural network with Re LU activation. The hidden size is set as 128. The learning rate and regularization parameter are set as 0.001 and 0.5. When building graphs, k is set as 20. |