Simplified Graph Convolution with Heterophily

Authors: Sudhanshu Chanpuriya, Cameron Musco

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we confirm that SGC is indeed ineffective for heterophilous (i.e., non-homophilous) graphs via experiments on synthetic and real-world datasets.
Researcher Affiliation Academia Sudhanshu Chanpuriya University of Massachusetts Amherst schanpuriya@cs.umass.edu Cameron Musco University of Massachusetts Amherst cmusco@cs.umass.edu
Pseudocode Yes Algorithm 1 Adaptive Simple Graph Convolution (ASGC) Filter
Open Source Code Yes We release code in the form of a Jupyter notebook (Pérez & Granger, 2007) demo which is available at github.com/schariya/adaptive-simple-convolution.
Open Datasets Yes We experiment on 10 commonly-used datasets, the same collection of datasets as Chien et al. (2021). CORA, CITESEER, and PUBMED are citation networks which are common benchmarks for node classification (Sen et al., 2008; Namata et al., 2012)... Like Chien et al. (2021), we use random 60%/20%/20% splits as training/validation/test data for the 5 heterophilous datasets, as in Pei et al. (2020), and use random 2.5%/2.5%/95% splits for the homophilous datasets, which is closer to the original setting from Kipf & Welling (2017) and Shchur et al. (2018).
Dataset Splits Yes Like Chien et al. (2021), we use random 60%/20%/20% splits as training/validation/test data for the 5 heterophilous datasets, as in Pei et al. (2020), and use random 2.5%/2.5%/95% splits for the homophilous datasets, which is closer to the original setting from Kipf & Welling (2017) and Shchur et al. (2018).
Hardware Specification No The paper states: 'As should be clear from the time complexity discussion in Section 3 and the dataset statistics in Table 1, our method is lightweight enough to run the benchmarks within a few hours on a laptop without a GPU, so compute is not a significant concern.' However, it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts.
Software Dependencies No The paper mentions: 'The SGC and ASGC algorithms are implemented in Python using Num Py (Harris et al., 2020) for least squares regression and other linear algebraic computations. We use scikit-learn (Pedregosa et al., 2011) for logistic regression with 1,000 maximum iterations and otherwise default settings.' However, it does not provide specific version numbers for Python, NumPy, or scikit-learn.
Experiment Setup Yes We tune the number of hops over K {1, 2, 4, 8}, roughly covering the range analyzed in Wu et al. (2019), and the regularization strength R = n R over log10(R ) { 4, 3, 2, 1, 0}. This dependency on the number of nodes n allows the regularization loss to scale with the least squares loss, which generally grows linearly with n. In Appendix 9.2, we report results for some additional experiments investigating the effect of fixing these hyperparameters. ... For our implementations of SGC and ASGC, we treat each network as undirected, in that if edge (i, j) appears, we also include edge (j, i). Like Chien et al. (2021), we use random 60%/20%/20% splits as training/validation/test data for the 5 heterophilous datasets, as in Pei et al. (2020), and use random 2.5%/2.5%/95% splits for the homophilous datasets, which is closer to the original setting from Kipf & Welling (2017) and Shchur et al. (2018).