One-shot Graph Neural Architecture Search with Dynamic Search Space

Authors: Yanxi Li, Zean Wen, Yunhe Wang, Chang Xu8510-8517

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments of semi-supervised and supervised node classification on citation networks, including Cora, Citeseer, and Pubmed, demonstrate that our method outperforms the current state-of-the-art manually designed architectures and reaches competitive performance to existing GNN NAS approaches with up to 10 times of speedup.
Researcher Affiliation Collaboration Yanxi Li1, Zean Wen1, Yunhe Wang2, Chang Xu1 1 School of Computer Science, University of Sydney, Australia 2 Noah s Ark Lab, Huawei Technologies, China
Pseudocode Yes Algorithm 1 Search with dynamic search space
Open Source Code No No statement or link for open-source code for the described methodology is provided in the paper.
Open Datasets Yes We search for architectures and test their performances on citation networks for node classification, including Cora, Citeseer, and Pubmed. Following the splits used by Yang, Cohen, and Salakhudinov (2016) and Gao et al. (2020), for the semi-supervised learning, we use 20 nodes per class for training and use 500 and 1,000 nodes for validation and testing, respectively
Dataset Splits Yes Following the splits used by Yang, Cohen, and Salakhudinov (2016) and Gao et al. (2020), for the semi-supervised learning, we use 20 nodes per class for training and use 500 and 1,000 nodes for validation and testing, respectively For the full supervised learning, we follow the splits of Gao et al. (2020), where 500 nodes are used for validation, 500 nodes are used for testing, and all the remaining nodes are used for training. The detailed statistics of the dataset along with the splits are shown in Table 3.
Hardware Specification No The paper mentions 'GPU hours' as a measure of search cost but does not specify any particular hardware components like GPU models, CPU models, or memory.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes The hyper-network has 16 channels and 2 layers. We follow Kipf and Welling (2016) and use a single GCN instead of MLP as the classifier. Both the network weights and the architecture parameters are optimized with ADAM. The network learning rate is set to 0.007, and the architecture learning rate is set to 0.1. The hyper-network is trained for 100 epochs. Dropout and weight decay are applied as regularization. We use a dropout rate of 60%, and use 3e-4 for the weight decay of network weights and 1e-3 for the weight decay of architecture parameters. The GCN classifier is excluded from weight decay.