AutoGT: Automated Graph Transformer Architecture Search
Authors: Zizhao Zhang, Xin Wang, Chaoyu Guan, Ziwei Zhang, Haoyang Li, Wenwu Zhu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and ablation studies show that our proposed Auto GT gains sufficient improvement over state-of-the-art hand-crafted baselines on all datasets, demonstrating its effectiveness and wide applicability. |
| Researcher Affiliation | Academia | Zizhao Zhang1, Xin Wang1,2 , Chaoyu Guan1, Ziwei Zhang1, Haoyang Li1, Wenwu Zhu1 1Department of Computer Science and Technology, Tsinghua University 2THU-Bosch JCML Center, Tsinghua University {zzz22, guancy19, lihy18}@mails.tsinghua.edu.cn {xin_wang, zwzhang, wwzhu}@tsinghua.edu.cn |
| Pseudocode | Yes | We list the training procedure of our method in Algorithm 1. Algorithm 1 Our proposed encoding-aware supernet training strategy |
| Open Source Code | Yes | 1Our codes are publicly available at https://github.com/Sand Martex/Auto GT |
| Open Datasets | Yes | Datasets and Baselines. We first consider six graph classification datasets from Deep Graph Kernels Benchmark((Yanardag & Vishwanathan, 2015)) and TUDataset (Morris et al., 2020), namely COX2_MD, BZR_MD, PTC_FM, DHFR_MD, PROTEINS, and DBLP. We also adopt three datasets from Open Graph Benchmark (OGB) (Hu et al., 2020a), including OGBG-Mol HIV, OGBG-Mol BACE, and OGBG-Mol BBBP. The task is to predict the label of each graph using node/edge attributes and graph structures. The detailed statistics of the datasets are shown in Table 6 in the appendix. |
| Dataset Splits | Yes | For all the datasets, we follow Errica et al., (Errica et al., 2020) to utilize 10-fold cross-validation for all the baselines and our proposed method. All the hyper-parameters and training strategies of baselines are implemented according to the publicly available codes (Errica et al., 2020)2. |
| Hardware Specification | No | The paper mentions "single GPU" but does not provide specific details on the GPU model, CPU, or other hardware specifications used for experiments. |
| Software Dependencies | No | The paper mentions using the "Adam optimizer" and states that hyperparameters and training strategies for baselines are implemented according to publicly available codes, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries). |
| Experiment Setup | Yes | Implementation Details. Recall that our proposed architecture space has two variants, a larger Auto GT(L = 8, d = 128) and a smaller Auto GTbase(L = 4, d = 32). In our experiments, we adopt the smaller search space for five relatively small datasets, i.e., all datasets except DBLP, and the larger search space for DBLP. We use the Adam optimizer, and the learning rate is 3e 4. For the smaller/larger datasets, we set the number of iterations to split (i.e., Ts in Algorithm 1 in Appendix) as 50/6 and the maximum number of iterations (i.e., Tm in Algorithm 1) as 200/50. The batch size is 128. The hyperparameters of these baselines are kept consistent with our method for a fair comparison. |