Learning to Pre-train Graph Neural Networks

Authors: Yuanfu Lu, Xunqiang Jiang, Yuan Fang, Chuan Shi4276-4284

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct a systematic empirical study on the pre-training of various GNN models, using both a public collection of protein graphs and a new compilation of bibliographic graphs for pre-training. Experimental results show that L2P-GNN is capable of learning effective and transferable prior knowledge that yields powerful representations for downstream tasks.
Researcher Affiliation Collaboration Yuanfu Lu1, 2 , Xunqiang Jiang1, Yuan Fang3, Chuan Shi1, 4 1Beijing University of Posts and Telecommunications 2We Chat Search Application Department, Tencent Inc. China 3Singapore Management University 4Peng Cheng Laboratory, Shenzhen, China
Pseudocode Yes Detailed pseudocode of the algorithm is in supplemental material, Appendix .
Open Source Code Yes (Code and datasets are available at https://github.com/rootlu/L2P-GNN.)
Open Datasets Yes The biology graphs come from a public repository1, covering 394,925 protein subgraphs (Marinka et al. 2019). We further present a new collection of bibliographic graphs called Pre DBLP, purposely compiled for pre-training GNNs based on DBLP2, which contains 1,054,309 paper subgraphs in 31 fields. 1http://snap.stanford.edu/gnn-pretrain 2https://dblp.uni-trier.de The new bibliographic dataset is publicly released
Dataset Splits Yes For both domains, we split downstream data with 8:1:1 ratio for train/validation/test sets.
Hardware Specification No The paper states "The hyper-parameter settings and experimental environment are discussed in Appendix." but does not provide specific hardware details in the main text.
Software Dependencies No The paper mentions that "Implementation details are presented in Appendix." but does not provide specific software names with version numbers in the main text.
Experiment Setup Yes Lastly, we investigate the effect of the number of nodeand graph-level adaptation steps (s, t), as well as the dimension of node representations. We plot the performance of L2P-GNN under combinations of 0 s 3 and 0 t 3 in Fig. 3(b). We find that L2P-GNN is robust to different values of s and t, except when one or both of them are zero (i.e., no adaptation at all). In particular, L2P-GNN can adapt quickly with only one gradient update in both adaptions (i.e., s = t = 1). Finally, we summarize the impact of the dimension in Fig. 3(c). We observe that L2P-GNN achieves the optimal performance when the dimension is 300 and is generally stable around the optimal setting, indicating that L2P-GNN is robust w.r.t. the representation dimension.