reproducibilityindex.ai

Multi-Scale Distillation from Multiple Graph Neural Networks

Authors: Chunhai Zhang, Jie Liu, Kai Dang, Wenzheng Zhang4337-4344

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted to evaluate the proposed method on four public datasets. The experimental results demonstrate the superiority of our proposed method over state-of-the-art methods.
Researcher Affiliation	Collaboration	1College Of Artiﬁcial Intelligence, Nankai University, Tianjin, China 2Cloopen AI Research, Beijing, China
Pseudocode	Yes	Algorithm 1: Multi-scale Knowledge Distillation.
Open Source Code	Yes	Our code is publicly available at https://github.com/NKU-IIPLab/MSKD.
Open Datasets	Yes	We conduct a series of node classiﬁcation tasks on four different datasets, i.e., PPI (Zitnik and Leskovec 2017), Cora, Cite Seer and Pub Med (Sen et al. 2008).
Dataset Splits	Yes	PPI contains 24 graphs that come from different human tissues and 121 categories, where 20 graphs are used for training, 2 graphs are used for validating and the left 2 graphs are used for testing.
Hardware Specification	No	The paper does not explicitly state the specific hardware used for running the experiments (e.g., GPU model, CPU type, memory).
Software Dependencies	No	The paper mentions software components like GAT, GCN, and Adam optimizer but does not provide specific version numbers for these or other libraries like PyTorch or TensorFlow.
Experiment Setup	Yes	In teacher GAT, each hidden layer has 4 attention heads and 256 hidden features, and the output layer has 6 attention heads and K hidden features. In student GAT, there are 5 layers, each hidden layer has 2 attention heads and 68 hidden features, and the output layer has 2 attention heads and K hidden features. The settings of the number of hidden features in each layer are the same in GCN. In all the methods, the optimizer is Adam, the learning rate is set to 0.005, training epochs are 500 and weight decay equals 0. We tune all other hyperparameters to the best results on the validation set. λ in the Equation (8) is set to 7, 3, 3 and 4 in four datasets.