Data-Free Adversarial Knowledge Distillation for Graph Neural Networks
Authors: Yuanxin Zhuang, Lingjuan Lyu, Chuan Shi, Carl Yang, Lichao Sun
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various benchmark models and six representative datasets demonstrate that our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task. |
| Researcher Affiliation | Collaboration | Yuanxin Zhuang1 , Lingjuan Lyu2 , Chuan Shi1 , Carl Yang3 and Lichao Sun4 1Beijing University of Posts and Telecommunications 2Sony AI 3Emory University 4Lehigh University |
| Pseudocode | Yes | Algorithm 1 DFAD-GNN |
| Open Source Code | No | The paper does not contain an explicit statement or link providing access to the source code for the described methodology. |
| Open Datasets | Yes | We adopt six graph classification benchmark datasets including three bioinformatics graph datasets, i.e., MUTAG, PTC MR, and PROTEINS, and three social network graph datasets, i.e., IMDB-BINARY, COLLAB, and REDDITBINARY. ... dataset split is based on the conventionally used training/test splits [Niepert et al., 2016; Zhang et al., 2018; Xu et al., 2018] with LIBSVM [Chang and Lin, 2011]. |
| Dataset Splits | Yes | for all experiments on these datasets, we evaluate the model performance with a 10-fold cross validation setting, where the dataset split is based on the conventionally used training/test splits [Niepert et al., 2016; Zhang et al., 2018; Xu et al., 2018] |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as CPU or GPU models. |
| Software Dependencies | No | The paper mentions software like 'Adam optimizer' and 'LIBSVM' but does not specify version numbers for any of its software dependencies. |
| Experiment Setup | Yes | For training, we use Adam optimizer with weight decay 5e-4 to update student models. The generator is trained with Adam without weight decay. Both student and generator are using a learning rate scheduler that multiplies the learning rate by a factor 0.3 at 10%, 30%, and 50% of the training epochs. The number of updates k of the student model in Algorithm 1 is set to 5. The threshold τ is empirically set to 0.5. ... We use 5 layers with 128 hidden units for teacher models. For the student model, we conduct experiments to gradually reduce the number of layers l {5, 3, 2, 1} and gradually reduce the number of hidden units h {128, 64, 32, 16}. |