Data-Free Adversarial Knowledge Distillation for Graph Neural Networks

Authors: Yuanxin Zhuang, Lingjuan Lyu, Chuan Shi, Carl Yang, Lichao Sun

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various benchmark models and six representative datasets demonstrate that our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
Researcher Affiliation Collaboration Yuanxin Zhuang1 , Lingjuan Lyu2 , Chuan Shi1 , Carl Yang3 and Lichao Sun4 1Beijing University of Posts and Telecommunications 2Sony AI 3Emory University 4Lehigh University
Pseudocode Yes Algorithm 1 DFAD-GNN
Open Source Code No The paper does not contain an explicit statement or link providing access to the source code for the described methodology.
Open Datasets Yes We adopt six graph classification benchmark datasets including three bioinformatics graph datasets, i.e., MUTAG, PTC MR, and PROTEINS, and three social network graph datasets, i.e., IMDB-BINARY, COLLAB, and REDDITBINARY. ... dataset split is based on the conventionally used training/test splits [Niepert et al., 2016; Zhang et al., 2018; Xu et al., 2018] with LIBSVM [Chang and Lin, 2011].
Dataset Splits Yes for all experiments on these datasets, we evaluate the model performance with a 10-fold cross validation setting, where the dataset split is based on the conventionally used training/test splits [Niepert et al., 2016; Zhang et al., 2018; Xu et al., 2018]
Hardware Specification No The paper does not provide specific details about the hardware used, such as CPU or GPU models.
Software Dependencies No The paper mentions software like 'Adam optimizer' and 'LIBSVM' but does not specify version numbers for any of its software dependencies.
Experiment Setup Yes For training, we use Adam optimizer with weight decay 5e-4 to update student models. The generator is trained with Adam without weight decay. Both student and generator are using a learning rate scheduler that multiplies the learning rate by a factor 0.3 at 10%, 30%, and 50% of the training epochs. The number of updates k of the student model in Algorithm 1 is set to 5. The threshold τ is empirically set to 0.5. ... We use 5 layers with 128 hidden units for teacher models. For the student model, we conduct experiments to gradually reduce the number of layers l {5, 3, 2, 1} and gradually reduce the number of hidden units h {128, 64, 32, 16}.