Strategies for Pre-training Graph Neural Networks
Authors: Weihua Hu*, Bowen Liu*, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We systematically study pre-training on multiple graph classification datasets. We find that naïve strategies... improves generalization significantly across downstream tasks, leading up to 9.4% absolute improvements in ROC-AUC over non-pre-trained models and achieving state-of-the-art performance for molecular property prediction and protein function prediction. 5 EXPERIMENTS, 5.1 DATASETS, 5.3 RESULTS, Table 1: Test ROC-AUC (%) performance... |
| Researcher Affiliation | Academia | Weihua Hu1 , Bowen Liu2 , Joseph Gomes4, Marinka Zitnik5, Percy Liang1, Vijay Pande3, Jure Leskovec1 1Department of Computer Science, 2Chemistry, 3Bioengineering, Stanford University, 4Department of Chemical and Biochemical Engineering, The University of Iowa, 5Department of Biomedical Informatics, Harvard University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project website, data and code: http://snap.stanford.edu/gnn-pretrain |
| Open Datasets | Yes | We release the new datasets at: http://snap.stanford.edu/gnn-pretrain. For the chemistry domain, we use 2 million unlabeled molecules sampled from the ZINC15 database (Sterling & Irwin, 2015)... For graph-level multi-task supervised pre-training, we use a preprocessed Ch EMBL dataset (Mayr et al., 2018; Gaulton et al., 2011)... as our downstream tasks, we decided to use 8 larger binary classification datasets contained in Molecule Net (Wu et al., 2018)... |
| Dataset Splits | Yes | The split for train/validation/test sets is 80%:10%:10%. ... The effective split ratio for the train/validation/prior/test sets is 69% : 12% : 9.5% : 9.5%. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only general information about training time. |
| Software Dependencies | Yes | We use Pytorch (Paszke et al., 2017) and Pytorch Geometric (Fey & Lenssen, 2019) for all of our implementation. |
| Experiment Setup | Yes | We select the following hyper-parameters that performed well across all downstream tasks in the validation sets: 300 dimensional hidden units, 5 GNN layers (K = 5), and average pooling for the READOUT function. ... All models are trained with Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.001. ... For self-supervised pre-training, we use a batch size of 256, while for supervised pre-training, we use a batch size of 32 with dropout rate of 20%. ... We use a batch size of 32 and dropout rate of 50%. ... train models for 100 epochs, while on the protein function prediction dataset...we train models for 50 epochs. |