Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks

Authors: Jiarong Xu, Renhong Huang, XIN JIANG, Yuxuan Cao, Carl Yang, Chunping Wang, YANG YANG

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance. 4 Experiments
Researcher Affiliation Collaboration Jiarong Xu1 , Renhong Huang2, Xin Jiang3, Yuxuan Cao2 Carl Yang4, Chunping Wang5, Yang Yang2 1Fudan University, 2Zhejiang University, 3Lehigh University 4Emory University, 5Finvolution Group
Pseudocode Yes The overall algorithm for APT is given in Algorithm 1.
Open Source Code Yes We provide an open-source implementation of our model APT at https://github.com/galina0217/APT.
Open Datasets Yes The datasets for pre-training and testing, along with their statistics, are listed in Appendix D. Pre-training datasets are collected from different domains, including social, citation, and movie networks. The graph datasets for pre-training and testing in this paper are collected from a wide spectrum of domains (see Table 3 for an overview). ... from Open Graph Benchmark [10].
Dataset Splits No The paper states 'For each dataset, we consistently use 90% of the data as the training set, and 10% as the testing set.' but does not explicitly mention a separate validation split or cross-validation setup.
Hardware Specification Yes We conduct all experiments on a single machine of Linux system with an Intel Xeon Gold 5118 (128G memory) and a Ge Force GTX Tesla P4 (8GB memory).
Software Dependencies Yes Our model is implemented under the following software settings: Pytorch version 1.4.0+cu100, CUDA version 10.0, networkx version 2.3, DGL version 0.4.3post2, sklearn version 0.20.3, numpy version 1.19.4, Python version 3.7.1.
Experiment Setup Yes In the training phase, we aim to utilize data from different domains to pre-train one graph model. We iteratively select graphs for pre-training until the predictive uncertainty of any candidate graph is below 3.5. For each selected graph, we choose samples with predictive uncertainty higher than 3. We set the number of subgraph instances queried in the graph for uncertainty estimation M as 500. The time-adaptive parameter γt in Eq. (4) follows a γt Beta(1, βt), where βt = 3 0.995t. We set the trade-off parameter λ = 10 for APT-L2, and λ = 500 for APT. The total iteration number is 100. We adopt GCC as the backbone pre-training model with its default hyper-parameters, including their subgraph instance definition. In the fine-tuning phase, we select logistic regression or SVM as the downstream classifier and adopt the same setting as GCC.