Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs
Authors: Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that KRD improves over the vanilla MLPs by 12.62% and outperforms its corresponding teacher GNNs by 2.16% averaged over 7 datasets and 3 GNN architectures. |
| Researcher Affiliation | Academia | 1AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China. |
| Pseudocode | Yes | Algorithm 1 Algorithm for KRD framework (Transductive) |
| Open Source Code | Yes | Codes are publicly available at: https://github.com/Lirong Wu/RKD. |
| Open Datasets | Yes | The effectiveness of the KRD framework is evaluated on seven real-world datasets, including Cora (Sen et al., 2008), Citeseer (Giles et al., 1998), Pubmed (Mc Callum et al., 2000), Coauthor-CS, Coauthor-Physics, Amazon Photo (Shchur et al., 2018), and ogbn-arxiv (Hu et al., 2020). |
| Dataset Splits | Yes | Concretely, the input and output of two settings are: (1) Transductive: training on X and YL and testing on (XU, YU). (2) Inductive: training on XL XU obs and YL and testing on (XU ind, YU ind)... For a fairer comparison, the model with the highest validation accuracy is selected for testing. |
| Hardware Specification | Yes | implemented based on the standard implementation in the DGL library (Wang et al., 2019) using the Py Torch 1.6.0 with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and NVIDIA V100 GPU. |
| Software Dependencies | Yes | implemented based on the standard implementation in the DGL library (Wang et al., 2019) using the Py Torch 1.6.0 with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and NVIDIA V100 GPU. |
| Experiment Setup | Yes | The following hyperparameters are set the same for all datasets: Epoch E = 500, noise variance δ = 1.0, and momentum rate η = 0.99 (0.9 for ogb-arxiv). The other dataset-specific hyperparameters are determined by an Auto ML toolkit NNI with the hyperparameter search spaces as: hidden dimension F = {128, 256, 512, 1024, 2048}, layer number L = {2, 3}, distillation temperature τ = {0.8, 0.9, 1.0, 1.1, 1.2}, loss weight α = {0.0, 0.1, 0.2, 0.3, 0.4, 0.5}, learning rate lr = {0.001, 0.005, 0.01}, and weight decay decay = {0.0, 0.0005, 0.001}. |