GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks
Authors: Yuwen Li, Miao Xiong, Bryan Hooi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on 6 datasets and 6 experimental settings demonstrate that GRAPHCLEANER outperforms the closest baseline, with an average improvement of 0.14 in F1 score, and 0.16 in MCC. |
| Researcher Affiliation | Academia | 1School of Computing, National University of Singapore, Singapore 2Institute of Data Science, National University of Singapore, Singapore. |
| Pseudocode | Yes | Algorithm 1 Synthetic Mislabel Dataset Generation; Algorithm 2 Neighborhood-Aware Mislabel Detector |
| Open Source Code | Yes | Corrected datasets and code are available at https://github.com/lywww/Graph Cleaner/tree/master. |
| Open Datasets | Yes | We use 6 datasets, namely, Cora, Cite Seer and Pub Med (Yang et al., 2016), Computers and Photo (Shchur et al., 2018), OGB-arxiv (Hu et al., 2020)... We publicly release 2 improved variants of Pub Med dataset: Pub Med Cleaned and Pub Med Multi for more accurate evaluation. |
| Dataset Splits | Yes | The node set V is partitioned into training, validation, and test sets, denoted by Vtrain, Vval, and Vtest. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned in the paper. |
| Experiment Setup | Yes | We use three mislabel rates, ϵ = 0.1, 0.05, 0.025, for realistic concern. ... Specifically, ϵ is set as 0.1, 0.05, 0.025... Since the average label error reported in Northcutt et al. (2021b) is 3.4%, we simply set the threshold as 0.97. All our experiments and case studies use this threshold. ... The maximum neighborhood size K determines the range of neighborhood we consider. To investigate the robustness of GRAPHCLEANER to K, we vary K from 1 to 5 with other parameters fixed. |