Towards Understanding and Reducing Graph Structural Noise for GNNs
Authors: Mingze Dong, Yuval Kluger
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the metric in various synthetic and real data and show it has strikingly high consistency with GNN learning performance. We propose a graph rewiring framework named graph propensity score (GPS) that denoises graphs in a feature-aware manner based on self-supervised training. We provide both theoretical guarantee and extensive benchmarking showing the efficacy of the GPS framework combining with the ESNR metric. |
| Researcher Affiliation | Academia | 1Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA 2Department of Pathology, School of Medicine, Yale University, New Haven, CT, USA 3Applied Math Program, Yale University, New Haven, CT, USA. Correspondence to: Yuval Kluger <yuval.kluger@yale.edu>. |
| Pseudocode | Yes | Algorithm 1 Biwhitening for matrix C Rm n. |
| Open Source Code | Yes | The code of ESNR is available at https://github.com/Mingze Dong/ESNR. [...] The code of GPS can be accessed at https://github.com/Mingze Dong/GPS. |
| Open Datasets | Yes | We evaluated the listed methods on nine datasets, including Web KB datasets Cornell, Texas, Wisconsin; Wikipedia Network datasets Chameleon and Squirrel; Actor dataset and the Planetoid datasets Cora, Citeseer, and Pubmed. [...] For Chameleon and Squirrel datasets, we included both the raw datasets and the preprocessed datasets suggested by (Platonov et al., 2023), yielding 16 data points. For both GCN and SOTA-GNN, we observe that ESNR exhibits strikingly high concordance with the accuracy difference with MLP across all datasets, compared with alternative homophily-based metrics, including edge / node homophily and aggregated homophily (Figure 5, Table 1) (Pei et al., 2020; Luan et al., 2022). Further experimental details for both evaluations in real data can be seen in Appendix D.3 and D.4 respectively. |
| Dataset Splits | Yes | For each dataset, we used the train/validation/test data split provided by pytorch_geometric. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for its experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using "pytorch_geometric" (in section 5.1 and Appendix D.2), but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | The hyperparameters were optimized by random search through Ray Tune, selecting the model with highest accuracy on the validation set. [...] Appendix D.5.1. HYPERPARAMETER SETTING Here lists all the hyperparameters selected by random search and used in our study. (Referring to Tables 4-10) |