Graph Contrastive Learning Automated

Authors: Yuning You, Tianlong Chen, Yang Shen, Zhangyang Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that JOAO performs on par with or sometimes better than the state-of-the-art competitors including Graph CL, on multiple graph datasets of various scales and types, yet without resorting to any laborious dataset-specific tuning on augmentation selection.
Researcher Affiliation Academia Yuning You 1 Tianlong Chen 2 Yang Shen 1 Zhangyang Wang 2 1Texas A&M University 2The University of Texas at Austin. Correspondence to: Yang Shen <yshen@tamu.edu>, Zhangyang Wang <atlaswang@utexas.edu>.
Pseudocode Yes Algorithm 1 AGD for optimization (3) Input: initial parameter θ(0), sampling distribution P(0) (A1,A2), optimization step N. for n = 1 to N do 1. Upper-level minimization: fix P(A1,A2) = P(n 1) (A1,A2), and call equation (4) to update θ(n). 2. Lower-level maximization: fix θ = θ(n), and call equation (9) to update P(n) (A1,A2). end for Return: Optimized parameter θ(N).
Open Source Code Yes We release the code at https://github.com/ Shen-Lab/Graph CL_Automated.
Open Datasets Yes Datasets. We use datasets of diverse nature from the benchmark TUDataset (Morris et al., 2020), including graph data for small molecules & proteins (Riesen & Bunke, 2008; Dobson & Doig, 2003), computer vision (Nene et al., 1996) and various relation networks (Yanardag & Vishwanathan, 2015; Rozemberczki et al., 2020) of diverse statistics (see Table S1 of Appendix B), under semi-supervised and unsupervised learning. Additionally we gather domain-specific bioinformatics datasets from the benchmark (Hu et al., 2019) of relatively similar statistics (see Table S2 of Appendix B), under transfer-learning tasks for predicting molecules chemical property or proteins biological function. Lastly we take two large-scale benchmark datasets, ogbg-ppa & ogbg-code from Open Graph Benchmark (OGB) (Hu et al., 2020a) (see Table S3 of Appendix B for statistics) to evaluate scalability under semi-supervised learning.
Dataset Splits Yes Learning protocols. Learning experiments are performed in three settings, following the same protocols as in SOTA work. (1) In semi-supervised learning (You et al., 2020a) on datasets without the explicit training/validation/test split, we perform pre-training with all data and did finetuning & evaluation with K folds where K = 1 label rate; and on datasets with the train/validation/test split, we only perform pre-training with the training data, finetuning on the partial training data and evaluation on the validation/test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. It only mentions the GNN architectures used.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). It mentions GNN architectures like Res GCN and GIN, but not the software environment or libraries used to implement or run them.
Experiment Setup Yes GNN architectures & augmentations. We adopt the same GNN architectures with default hyper-parameters as in the SOTA methods under individual experiment settings. Specifically, (1) in semi-supervised learning, Res GCN (Chen et al., 2019) is used with 5 layers and 128 hidden dimensions, (2) in unsupervised representation learning, GIN (Xu et al., 2018) is used with 3 layers and 32 hidden dimensions, and (3) in transfer learning and on large-scale OGB datasets, GIN is used with 5 layers and 300 hidden dimensions. Plus, we adopt the same graph data augmentations as in Graph CL (You et al., 2020a) with the default augmentation strength 0.2. We tune the hyper-parameter γ controlling the trade-off in the optimization (3) in the range of {0.01, 0.1, 1}.