Towards Understanding and Reducing Graph Structural Noise for GNNs

Authors: Mingze Dong, Yuval Kluger

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the metric in various synthetic and real data and show it has strikingly high consistency with GNN learning performance. We propose a graph rewiring framework named graph propensity score (GPS) that denoises graphs in a feature-aware manner based on self-supervised training. We provide both theoretical guarantee and extensive benchmarking showing the efficacy of the GPS framework combining with the ESNR metric.
Researcher Affiliation Academia 1Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA 2Department of Pathology, School of Medicine, Yale University, New Haven, CT, USA 3Applied Math Program, Yale University, New Haven, CT, USA. Correspondence to: Yuval Kluger <yuval.kluger@yale.edu>.
Pseudocode Yes Algorithm 1 Biwhitening for matrix C Rm n.
Open Source Code Yes The code of ESNR is available at https://github.com/Mingze Dong/ESNR. [...] The code of GPS can be accessed at https://github.com/Mingze Dong/GPS.
Open Datasets Yes We evaluated the listed methods on nine datasets, including Web KB datasets Cornell, Texas, Wisconsin; Wikipedia Network datasets Chameleon and Squirrel; Actor dataset and the Planetoid datasets Cora, Citeseer, and Pubmed. [...] For Chameleon and Squirrel datasets, we included both the raw datasets and the preprocessed datasets suggested by (Platonov et al., 2023), yielding 16 data points. For both GCN and SOTA-GNN, we observe that ESNR exhibits strikingly high concordance with the accuracy difference with MLP across all datasets, compared with alternative homophily-based metrics, including edge / node homophily and aggregated homophily (Figure 5, Table 1) (Pei et al., 2020; Luan et al., 2022). Further experimental details for both evaluations in real data can be seen in Appendix D.3 and D.4 respectively.
Dataset Splits Yes For each dataset, we used the train/validation/test data split provided by pytorch_geometric.
Hardware Specification No The paper does not explicitly describe the specific hardware used for its experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions using "pytorch_geometric" (in section 5.1 and Appendix D.2), but does not specify version numbers for any software dependencies.
Experiment Setup Yes The hyperparameters were optimized by random search through Ray Tune, selecting the model with highest accuracy on the validation set. [...] Appendix D.5.1. HYPERPARAMETER SETTING Here lists all the hyperparameters selected by random search and used in our study. (Referring to Tables 4-10)