AutoGT: Automated Graph Transformer Architecture Search

Authors: Zizhao Zhang, Xin Wang, Chaoyu Guan, Ziwei Zhang, Haoyang Li, Wenwu Zhu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and ablation studies show that our proposed Auto GT gains sufficient improvement over state-of-the-art hand-crafted baselines on all datasets, demonstrating its effectiveness and wide applicability.
Researcher Affiliation Academia Zizhao Zhang1, Xin Wang1,2 , Chaoyu Guan1, Ziwei Zhang1, Haoyang Li1, Wenwu Zhu1 1Department of Computer Science and Technology, Tsinghua University 2THU-Bosch JCML Center, Tsinghua University {zzz22, guancy19, lihy18}@mails.tsinghua.edu.cn {xin_wang, zwzhang, wwzhu}@tsinghua.edu.cn
Pseudocode Yes We list the training procedure of our method in Algorithm 1. Algorithm 1 Our proposed encoding-aware supernet training strategy
Open Source Code Yes 1Our codes are publicly available at https://github.com/Sand Martex/Auto GT
Open Datasets Yes Datasets and Baselines. We first consider six graph classification datasets from Deep Graph Kernels Benchmark((Yanardag & Vishwanathan, 2015)) and TUDataset (Morris et al., 2020), namely COX2_MD, BZR_MD, PTC_FM, DHFR_MD, PROTEINS, and DBLP. We also adopt three datasets from Open Graph Benchmark (OGB) (Hu et al., 2020a), including OGBG-Mol HIV, OGBG-Mol BACE, and OGBG-Mol BBBP. The task is to predict the label of each graph using node/edge attributes and graph structures. The detailed statistics of the datasets are shown in Table 6 in the appendix.
Dataset Splits Yes For all the datasets, we follow Errica et al., (Errica et al., 2020) to utilize 10-fold cross-validation for all the baselines and our proposed method. All the hyper-parameters and training strategies of baselines are implemented according to the publicly available codes (Errica et al., 2020)2.
Hardware Specification No The paper mentions "single GPU" but does not provide specific details on the GPU model, CPU, or other hardware specifications used for experiments.
Software Dependencies No The paper mentions using the "Adam optimizer" and states that hyperparameters and training strategies for baselines are implemented according to publicly available codes, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries).
Experiment Setup Yes Implementation Details. Recall that our proposed architecture space has two variants, a larger Auto GT(L = 8, d = 128) and a smaller Auto GTbase(L = 4, d = 32). In our experiments, we adopt the smaller search space for five relatively small datasets, i.e., all datasets except DBLP, and the larger search space for DBLP. We use the Adam optimizer, and the learning rate is 3e 4. For the smaller/larger datasets, we set the number of iterations to split (i.e., Ts in Algorithm 1 in Appendix) as 50/6 and the maximum number of iterations (i.e., Tm in Algorithm 1) as 200/50. The batch size is 128. The hyperparameters of these baselines are kept consistent with our method for a fair comparison.