Disentangled Continual Graph Neural Architecture Search with Invariant Modular Supernet
Authors: Zeyang Zhang, Xin Wang, Yijian Qin, Hong Chen, Ziwei Zhang, Xu Chu, Wenwu Zhu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method achieves state-of-the-art performance against baselines in continual graph neural architecture search. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, BNRIST, Tsinghua University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 The pipeline for GASIM Require: The number of tasks T, hyperparameter λ, K. 1: Construct the modular super-network in Sec. 4.1. 2: for l = 1, . . . , T do 3: Predict the latent factors as Eq. (8) 4: Route the task to the module as Eq. (10) 5: Calculate routing loss as Eq. (11) 6: Calculate invariance loss as Eq. (13) 7: Calculate the final loss as Eq. (14) 8: Search the architecture according to Eq. (15) 9: Fix the architecture and finetune the weights 10: end for |
| Open Source Code | No | The paper does not provide any explicit statements about releasing code or links to a code repository for the described methodology. |
| Open Datasets | Yes | Cora Full (Mc Callum et al., 2000), Arxiv (Hu et al., 2020), and Reddit (Hamilton et al., 2017). |
| Dataset Splits | Yes | All datasets are partitioned into a set of tasks, each focusing on the node classification problem, where each task involves nodes from two distinct classes within an incoming graph. For each task, 60% of the nodes are allocated for training, 20% for validation, and 20% for testing. |
| Hardware Specification | Yes | CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz GPU: NVIDIA Ge Force RTX 4090 with 24 GB of memory |
| Software Dependencies | Yes | Software: Python 3.8.18, Cuda 12.2, Py Torch (Paszke et al., 2019) 2.1.2, Py Torch Geometric (Fey & Lenssen, 2019) 2.4.0. |
| Experiment Setup | Yes | For fair comparisons, all methods adopt the same dimensionality d as 512, number of layers as 2. Adam optimizer (Kingma & Ba, 2014) is adopted to optimize the model weights with a learning rate 1e-3 and another SGD optimizer with a learning rate 1e-2 is adopted to optimize architecture parameters for NAS methods. For our method, we adopt K = 3 for all datasets, and the hyperparameter λ {0.01, 0.1, 1, 10, 100}. |