Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods
Authors: Derek Lim, Felix Hohne, Xiuyu Li, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, Ser Nam Lim
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results with representative simple methods and GNNs across our proposed datasets show that LINKX achieves state-of-the-art performance for learning on non-homophilous graphs. |
| Researcher Affiliation | Collaboration | Derek Lim Cornell University dl772@cornell.edu Felix Hohne Cornell University fmh42@cornell.edu Xiuyu Li Cornell University xl289@cornell.edu Sijia Linda Huang Cornell University sh837@cornell.edu Vaishnavi Gupta Cornell University vg222@cornell.edu Omkar Bhalerao Cornell University opb7@cornell.edu Ser-Nam Lim Facebook AI sernam@gmail.com |
| Pseudocode | No | The paper describes the LINKX model with a diagram, but it does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale. |
| Open Datasets | Yes | Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale. Here, we detail the non-homophilous datasets that we propose for graph machine learning evaluation. Our datasets and tasks span diverse application areas. Penn94 [67], Pokec [41], genius [43], and twitch-gamers [60] are online social networks, where the task is to predict reported gender, certain account labels, or use of explicit content on user accounts. For the citation networks ar Xiv-year [31] and snap-patents [42, 41] the goal is to predict year of paper publication or the year that a patent is granted. |
| Dataset Splits | Yes | We run each method on the same five random 50/25/25 train/val/test splits for each dataset. |
| Hardware Specification | Yes | This is especially important on the scale of the wiki dataset, where none of our tested methods other than MLP is capable of running on a Titan RTX GPU with 24 GB GPU RAM (see Section 5). |
| Software Dependencies | No | We implement our models using PyTorch [56] and PyTorch Geometric [23]. We use the Optuna framework [2] for hyperparameter optimization and WandB [40] for logging. No specific version numbers for these software dependencies are provided. |
| Experiment Setup | Yes | All methods requiring gradient-based optimization are run for 500 epochs, with test performance reported for the learned parameters of highest validation performance... All methods are trained for 500 epochs with Adam optimizer [37] using a learning rate of 0.01 and a weight decay of 0.0005 (unless otherwise specified). Dropout [65] with p = 0.5 and ELU [19] activations are used for all hidden layers of all MLPs and GNNs. |