Graph Neural Architecture Search

Authors: Yang Gao, Hong Yang, Peng Zhang, Chuan Zhou, Yue Hu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on real-world datasets demonstrate that Graph NAS can design a novel network architecture that rivals the best human-invented architecture in terms of validation set accuracy. Moreover, in a transfer learning task we observe that graph neural architectures designed by Graph NAS, when transferred to new datasets, still gain improvement in terms of prediction accuracy.
Researcher Affiliation Collaboration Yang Gao 1,5 , Hong Yang 2 , Peng Zhang 3 , Chuan Zhou 4,5 and Yue Hu 1,5 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2Centre for Artificial Intelligence, University of Technology Sydney, Australia 3Ant Financial Services Group, Hangzhou, China 4Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China 5School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Pseudocode Yes Algorithm 1 Graph NAS search algorithm
Open Source Code Yes We have released the python codes on Github1 for comparison. 1https://github.com/Graph NAS/Graph NAS
Open Datasets Yes Datasets. We use three popular citation networks, i.e., Cora, Citeseer and Pubmed, as the testbed. To test the capability of transferring the architectures designed by Graph NAS, we use the co-author datasets of MS-CS and MS-Physics, and the product networks of Amazon Computers and Amazon Photos [Shchur et al., 2018].
Dataset Splits Yes In the semi-supervised learning task, the datasets follow the settings of [Kipf and Welling, 2017]. During training, only 20 labels per class are used for each citation network, 500 nodes in total for validation and 1,000 nodes for testing. In the supervised learning task, 500 nodes in each dataset are selected as the validation set and 500 nodes are selected as the test set. The rest of nodes are selected from the graph as training data. Each split contains 500 nodes for evaluation, 500 nodes for test, and the rest for training.
Hardware Specification Yes The experiments are tested on a single NVIDIA 1080Ti.
Software Dependencies No The GNN architectures used in Graph NAS are implemented by PYG [Fey and Lenssen, 2019]. While PYG is mentioned, specific version numbers for PYG or other key software dependencies (e.g., Python, PyTorch) are not provided, which is necessary for reproducibility.
Experiment Setup Yes Hyper-parameters of the controller: The controller is a one-layer LSTM with 100 hidden units. It is trained with the ADAM optimizer with a learning rate of 0.00035. The weights of the controller are initialized uniformly between -0.1 and 0.1. To prevent premature convergence, we also use a tanh of 2.5 and a temperature of 5.0 for the sampling logits [Bello et al., 2017], and add the controller s sample entropy to the reward, weighted by 0.0001. After Graph NAS searches S = 2000 architectures, we collect the top K = 5 architectures that achieve the best validation accuracy. Then we train those model for N = 20 times to choose the best models. Each GNN designed by Graph NAS contains L = 2 layers for fair comparisons. Hyper-parameters of GNNs: Once the controller samples an architecture, a child model is constructed and trained for 300 epochs. We apply the L2 regularization with λ = 0.0005, dropout probability p = 0.6, and learning rate lr = 0.005 as the default parameters. To achieve the best results, the hyper-parameters of the GNN models are searched over the following search space: Hidden size: [8, 16, 32, 64, 128, 256, 512] Learning rate: [1e-2, 1e-3, 1e-4, 5e-3, 5e-4] Dropout: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] L2 regularization strength: [0, 1e-3, 1e-4, 1e-5, 5e-5, 5e-4]