reproducibilityindex.ai

Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

Authors: Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that KRD improves over the vanilla MLPs by 12.62% and outperforms its corresponding teacher GNNs by 2.16% averaged over 7 datasets and 3 GNN architectures.
Researcher Affiliation	Academia	1AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China.
Pseudocode	Yes	Algorithm 1 Algorithm for KRD framework (Transductive)
Open Source Code	Yes	Codes are publicly available at: https://github.com/Lirong Wu/RKD.
Open Datasets	Yes	The effectiveness of the KRD framework is evaluated on seven real-world datasets, including Cora (Sen et al., 2008), Citeseer (Giles et al., 1998), Pubmed (Mc Callum et al., 2000), Coauthor-CS, Coauthor-Physics, Amazon Photo (Shchur et al., 2018), and ogbn-arxiv (Hu et al., 2020).
Dataset Splits	Yes	Concretely, the input and output of two settings are: (1) Transductive: training on X and YL and testing on (XU, YU). (2) Inductive: training on XL XU obs and YL and testing on (XU ind, YU ind)... For a fairer comparison, the model with the highest validation accuracy is selected for testing.
Hardware Specification	Yes	implemented based on the standard implementation in the DGL library (Wang et al., 2019) using the Py Torch 1.6.0 with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and NVIDIA V100 GPU.
Software Dependencies	Yes	implemented based on the standard implementation in the DGL library (Wang et al., 2019) using the Py Torch 1.6.0 with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and NVIDIA V100 GPU.
Experiment Setup	Yes	The following hyperparameters are set the same for all datasets: Epoch E = 500, noise variance δ = 1.0, and momentum rate η = 0.99 (0.9 for ogb-arxiv). The other dataset-specific hyperparameters are determined by an Auto ML toolkit NNI with the hyperparameter search spaces as: hidden dimension F = {128, 256, 512, 1024, 2048}, layer number L = {2, 3}, distillation temperature τ = {0.8, 0.9, 1.0, 1.1, 1.2}, loss weight α = {0.0, 0.1, 0.2, 0.3, 0.4, 0.5}, learning rate lr = {0.001, 0.005, 0.01}, and weight decay decay = {0.0, 0.0005, 0.001}.