GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks

Authors: Yuwen Li, Miao Xiong, Bryan Hooi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on 6 datasets and 6 experimental settings demonstrate that GRAPHCLEANER outperforms the closest baseline, with an average improvement of 0.14 in F1 score, and 0.16 in MCC.
Researcher Affiliation Academia 1School of Computing, National University of Singapore, Singapore 2Institute of Data Science, National University of Singapore, Singapore.
Pseudocode Yes Algorithm 1 Synthetic Mislabel Dataset Generation; Algorithm 2 Neighborhood-Aware Mislabel Detector
Open Source Code Yes Corrected datasets and code are available at https://github.com/lywww/Graph Cleaner/tree/master.
Open Datasets Yes We use 6 datasets, namely, Cora, Cite Seer and Pub Med (Yang et al., 2016), Computers and Photo (Shchur et al., 2018), OGB-arxiv (Hu et al., 2020)... We publicly release 2 improved variants of Pub Med dataset: Pub Med Cleaned and Pub Med Multi for more accurate evaluation.
Dataset Splits Yes The node set V is partitioned into training, validation, and test sets, denoted by Vtrain, Vval, and Vtest.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running experiments were mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers were mentioned in the paper.
Experiment Setup Yes We use three mislabel rates, ϵ = 0.1, 0.05, 0.025, for realistic concern. ... Specifically, ϵ is set as 0.1, 0.05, 0.025... Since the average label error reported in Northcutt et al. (2021b) is 3.4%, we simply set the threshold as 0.97. All our experiments and case studies use this threshold. ... The maximum neighborhood size K determines the range of neighborhood we consider. To investigate the robustness of GRAPHCLEANER to K, we vary K from 1 to 5 with other parameters fixed.