Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Authors: Derek Lim, Felix Hohne, Xiuyu Li, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, Ser Nam Lim

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results with representative simple methods and GNNs across our proposed datasets show that LINKX achieves state-of-the-art performance for learning on non-homophilous graphs.
Researcher Affiliation Collaboration Derek Lim Cornell University dl772@cornell.edu Felix Hohne Cornell University fmh42@cornell.edu Xiuyu Li Cornell University xl289@cornell.edu Sijia Linda Huang Cornell University sh837@cornell.edu Vaishnavi Gupta Cornell University vg222@cornell.edu Omkar Bhalerao Cornell University opb7@cornell.edu Ser-Nam Lim Facebook AI sernam@gmail.com
Pseudocode No The paper describes the LINKX model with a diagram, but it does not include a formal pseudocode or algorithm block.
Open Source Code Yes Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale.
Open Datasets Yes Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale. Here, we detail the non-homophilous datasets that we propose for graph machine learning evaluation. Our datasets and tasks span diverse application areas. Penn94 [67], Pokec [41], genius [43], and twitch-gamers [60] are online social networks, where the task is to predict reported gender, certain account labels, or use of explicit content on user accounts. For the citation networks ar Xiv-year [31] and snap-patents [42, 41] the goal is to predict year of paper publication or the year that a patent is granted.
Dataset Splits Yes We run each method on the same five random 50/25/25 train/val/test splits for each dataset.
Hardware Specification Yes This is especially important on the scale of the wiki dataset, where none of our tested methods other than MLP is capable of running on a Titan RTX GPU with 24 GB GPU RAM (see Section 5).
Software Dependencies No We implement our models using PyTorch [56] and PyTorch Geometric [23]. We use the Optuna framework [2] for hyperparameter optimization and WandB [40] for logging. No specific version numbers for these software dependencies are provided.
Experiment Setup Yes All methods requiring gradient-based optimization are run for 500 epochs, with test performance reported for the learned parameters of highest validation performance... All methods are trained for 500 epochs with Adam optimizer [37] using a learning rate of 0.01 and a weight decay of 0.0005 (unless otherwise specified). Dropout [65] with p = 0.5 and ELU [19] activations are used for all hidden layers of all MLPs and GNNs.