Multi-Scale Distillation from Multiple Graph Neural Networks
Authors: Chunhai Zhang, Jie Liu, Kai Dang, Wenzheng Zhang4337-4344
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted to evaluate the proposed method on four public datasets. The experimental results demonstrate the superiority of our proposed method over state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1College Of Artificial Intelligence, Nankai University, Tianjin, China 2Cloopen AI Research, Beijing, China |
| Pseudocode | Yes | Algorithm 1: Multi-scale Knowledge Distillation. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/NKU-IIPLab/MSKD. |
| Open Datasets | Yes | We conduct a series of node classification tasks on four different datasets, i.e., PPI (Zitnik and Leskovec 2017), Cora, Cite Seer and Pub Med (Sen et al. 2008). |
| Dataset Splits | Yes | PPI contains 24 graphs that come from different human tissues and 121 categories, where 20 graphs are used for training, 2 graphs are used for validating and the left 2 graphs are used for testing. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware used for running the experiments (e.g., GPU model, CPU type, memory). |
| Software Dependencies | No | The paper mentions software components like GAT, GCN, and Adam optimizer but does not provide specific version numbers for these or other libraries like PyTorch or TensorFlow. |
| Experiment Setup | Yes | In teacher GAT, each hidden layer has 4 attention heads and 256 hidden features, and the output layer has 6 attention heads and K hidden features. In student GAT, there are 5 layers, each hidden layer has 2 attention heads and 68 hidden features, and the output layer has 2 attention heads and K hidden features. The settings of the number of hidden features in each layer are the same in GCN. In all the methods, the optimizer is Adam, the learning rate is set to 0.005, training epochs are 500 and weight decay equals 0. We tune all other hyperparameters to the best results on the validation set. λ in the Equation (8) is set to 7, 3, 3 and 4 in four datasets. |