Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs
Authors: Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, Danai Koutra
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical analysis shows that the identified designs increase the accuracy of GNNs by up to 40% and 27% over models without them on synthetic and real networks with heterophily, respectively, and yield competitive performance under homophily. and Extensive Empirical Evaluation: We empirically analyze our model and competitive existing GNN models on both synthetic and real networks covering the full spectrum of low-to-high homophily |
| Researcher Affiliation | Academia | Jiong Zhu University of Michigan jiongzhu@umich.edu Yujun Yan University of Michigan yujunyan@umich.edu Lingxiao Zhao Carnegie Mellon University lingxia1@andrew.cmu.edu Mark Heimann University of Michigan mheimann@umich.edu Leman Akoglu Carnegie Mellon University lakoglu@andrew.cmu.edu Danai Koutra University of Michigan dkoutra@umich.edu |
| Pseudocode | Yes | We describe H2GCN, which exemplifies how effectively combining designs D1-D3 can help better adapt to the whole spectrum of low-to-high homophily, while avoiding interference with other designs. It has three stages (Alg. 1, App. D): |
| Open Source Code | Yes | We compare it to prior GNN models, and make our code and data available at https://github.com/Gems Lab/H2GCN. |
| Open Datasets | Yes | We generate synthetic graphs with various homophily ratios h (Tab. 3) by adopting an approach similar to [16]. and We now evaluate the performance of our model and existing GNNs on a variety of real-world datasets [35, 29, 30, 22, 4, 31] with edge homophily ratio h ranging from strong heterophily to strong homophily, going beyond the traditional Cora, Pubmed and Citeseer graphs that have strong homophily (hence the good performance of existing GNNs on them). |
| Dataset Splits | Yes | All methods share the same training, validation and test splits (25%, 25%, 50% per class), and we report the average accuracy and standard deviation (stdev) over three generated graphs per heterophily level and benchmark dataset. For all benchmarks (except Cora-Full), we use the feature vectors, class labels, and 10 random splits (48%/32%/20% of nodes per class for train/validation/test2) provided by [26]. For Cora-Full, we generate 3 random splits, with 25%/25%/50% of nodes per class for train/validation/test. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Quadro P6000 GPU used for this research. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers, such as Python or PyTorch versions, or specific library versions used for implementation or experiments. |
| Experiment Setup | Yes | We tune all the models on the same train/validation splits (see App. F for details). Appendix F states: 'We perform hyperparameter tuning over the following parameters for all models: learning rate (0.01 for GCN, GAT, GCN-Cheby, MixHop, GraphSAGE, H2GCN; 0.001 for MLP), weight decay (0.0005 for all), dropout (0.5 for all except MLP which is 0.0), and number of hidden units (256 for all). We train all models for a maximum of 1000 epochs, using Adam optimizer (with momentum 0.9 and decay 0.999), and full batch.' |