Graph-less Neural Networks: Teaching Old MLPs New Tricks Via Distillation

Authors: Shichang Zhang, Yozen Liu, Yizhou Sun, Neil Shah

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Under a production setting involving both transductive and inductive predictions across 7 datasets, GLNN accuracies improve over stand-alone MLPs by 12.36% on average and match GNNs on 6/7 datasets. Comprehensive analysis shows when and why GLNNs can achieve competitive accuracies to GNNs and suggests GLNN as a handy choice for latency-constrained applications. and Evaluation Protocol. For all experiments in this section, we report the average and standard deviation over ten runs with different random seeds. Model performance is measured as accuracy, and results are reported on test data with the best model selected using validation data.
Researcher Affiliation Collaboration Shichang Zhang University of California, Los Angeles shichang@cs.ucla.edu Yozen Liu Snap Inc. yliu2@snap.com Yizhou Sun University of California, Los Angeles yzsun@cs.ucla.edu Neil Shah Snap Inc. nshah@snap.com
Pseudocode No The paper describes the GLNN framework conceptually and mathematically (Equation 1), but does not provide pseudocode or an algorithm block.
Open Source Code Yes Code available at https://github.com/ snap-research/graphless-neural-networks
Open Datasets Yes Datasets. We consider all five datasets used in the CPF paper (Yang et al., 2021a), i.e. Cora, Citeseer, Pubmed, A-computer, and A-photo. To fully evaluate our method, we also include two more larger OGB datasets (Hu et al., 2020), i.e. Arxiv and Products.
Dataset Splits Yes Evaluation Protocol. For all experiments in this section, we report the average and standard deviation over ten runs with different random seeds. Model performance is measured as accuracy, and results are reported on test data with the best model selected using validation data. We also evaluate on V U obs containing the other 80% of the test data... For all datasets, we follow the setting in the original paper to split the data...For the OGB datasets, we follow the OGB official splits based on time and popularity for Arxiv and Products respectively.
Hardware Specification Yes We run all experiments on a machine with 80 Intel(R) Xeon(R) E5-2698 v4 @ 2.20GHz CPUs, and a single NVIDIA V100 GPU with 16GB RAM.
Software Dependencies No The experiments on both baselines and our approach are implemented using Py Torch, the DGL (Wang et al., 2019) library for GNN algorithms, and Adam (Kingma & Ba, 2015) for optimization. While it lists software, it does not provide specific version numbers for PyTorch, DGL, or Adam.
Experiment Setup Yes The hyperparameters of GNN models on each dataset are taken from the best hyperparameters provided by the CPF paper and the OGB official examples. For the student MLPs and GLNN s, unless otherwise specified with -wi or -Li, we set the number of layers and the hidden dimension of each layer to be the same as the teacher GNN, so their total number of parameters stays the same as the teacher GNN. For GLNN s we do a hyperparameter search of learning rate from [0.01, 0.005, 0.001], weight decay from [0, 0.001, 0.002, 0.005, 0.01], and dropout from [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6]