reproducibilityindex.ai

Automated Self-Supervised Learning for Graphs

Authors: Wei Jin, Xiaorui Liu, Xiangyu Zhao, Yao Ma, Neil Shah, Jiliang Tang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By evaluating the framework on 8 real-world datasets, our experimental results show that AUTOSSL can signiﬁcantly boost the performance on downstream tasks including node clustering and node classiﬁcation compared with training under individual tasks.
Researcher Affiliation	Collaboration	Wei Jin Michigan State University jinwei2@msu.edu; Xiaorui Liu Michigan State University xiaorui@msu.edu; Xiaoyu Zhao City University of Hong Kong xy.zhao@cityu.edu.hk; Yao Ma New Jersey Institute of Technology yao.ma@njit.edu; Neil Shah Snap Inc. nshah@snap.com; Jiliang Tang Michigan State University tangjili@msu.edu
Pseudocode	Yes	C ALGORITHM The detailed algorithm for AUTOSSL-ES is shown in Algorithm 1. Concretely, for each round (iteration) of AUTOSSL-ES, we sample K sets of task weights, i.e., K different combinations of SSL tasks, from a multivariate normal distribution. Then we train K graph neural networks independently on each set of task weights. Afterwards, we calculate the pseudo-homohily for each network and adjust the mean and variance of the multivariate normal distribution through CMA-ES based on their pseudo-homohily. The detailed algorithm for AUTOSSL-DS is summarized in Algorithm 2. Speciﬁcally, we ﬁrst update the GNN parameter θ through one step gradient descent; then we perform k-means clustering to obtain centroids, which are used to calculate the homophily loss H. Afterwards, we calculate the meta-gradient meta {λi}, update {λi} through gradient descent and clip {λi} to [0, 1].
Open Source Code	Yes	To ensure reproducibility of our experiments, we provide our source code at https://github. com/Chandler Bang/Auto SSL.
Open Datasets	Yes	We perform experiments on 8 real-world datasets widely used in the literature (Yang et al., 2016; Shchur et al., 2018; Mernyei & Cangea, 2020; Hu et al., 2020), i.e., Physics, CS, Photo, Computers, Wiki CS, Citeseer, Cora Full, and ogbn-arxiv. [...] All datasets can be loaded from Py Torch Geometric (Fey & Lenssen, 2019).
Dataset Splits	Yes	For other datasets, we split the nodes into 10%/10%/80% for training/validation/test.
Hardware Specification	Yes	We perform experiments on one NVIDIA Tesla K80 GPU and one NVIDIA Tesla V100 GPU. Additionally, we use eight CPUs, with the model name as Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz. The operating system we use is Cent OS Linux 7 (Core).
Software Dependencies	No	The paper mentions the use of "Py Torch Geometric" and "one-layer GCN", but does not provide specific version numbers for these software components or other libraries used in the experiments.
Experiment Setup	Yes	We set the size of hidden dimensions to 512, weight decay to 0, dropout rate to 0. For individual SSL methods and AUTOSSL-ES, we set learning rate to 0.001, use Adam optimizer (Kingma & Ba, 2014), train the models with 1000 epochs and adopt early stopping strategy. For AUTOSSLDS, we train the models with 1000 epochs and choose the model checkpoint that achieves the highest pseudo-homophily. We use Adam optimizer for both inner and outer optimization. The learning rate for outer optimization is set to 0.05. For AUTOSSL-ES, we use a population size of 8 for each round.