Disentangled Continual Graph Neural Architecture Search with Invariant Modular Supernet

Authors: Zeyang Zhang, Xin Wang, Yijian Qin, Hong Chen, Ziwei Zhang, Xu Chu, Wenwu Zhu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method achieves state-of-the-art performance against baselines in continual graph neural architecture search.
Researcher Affiliation Academia 1Department of Computer Science and Technology, BNRIST, Tsinghua University, Beijing, China.
Pseudocode Yes Algorithm 1 The pipeline for GASIM Require: The number of tasks T, hyperparameter λ, K. 1: Construct the modular super-network in Sec. 4.1. 2: for l = 1, . . . , T do 3: Predict the latent factors as Eq. (8) 4: Route the task to the module as Eq. (10) 5: Calculate routing loss as Eq. (11) 6: Calculate invariance loss as Eq. (13) 7: Calculate the final loss as Eq. (14) 8: Search the architecture according to Eq. (15) 9: Fix the architecture and finetune the weights 10: end for
Open Source Code No The paper does not provide any explicit statements about releasing code or links to a code repository for the described methodology.
Open Datasets Yes Cora Full (Mc Callum et al., 2000), Arxiv (Hu et al., 2020), and Reddit (Hamilton et al., 2017).
Dataset Splits Yes All datasets are partitioned into a set of tasks, each focusing on the node classification problem, where each task involves nodes from two distinct classes within an incoming graph. For each task, 60% of the nodes are allocated for training, 20% for validation, and 20% for testing.
Hardware Specification Yes CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz GPU: NVIDIA Ge Force RTX 4090 with 24 GB of memory
Software Dependencies Yes Software: Python 3.8.18, Cuda 12.2, Py Torch (Paszke et al., 2019) 2.1.2, Py Torch Geometric (Fey & Lenssen, 2019) 2.4.0.
Experiment Setup Yes For fair comparisons, all methods adopt the same dimensionality d as 512, number of layers as 2. Adam optimizer (Kingma & Ba, 2014) is adopted to optimize the model weights with a learning rate 1e-3 and another SGD optimizer with a learning rate 1e-2 is adopted to optimize architecture parameters for NAS methods. For our method, we adopt K = 3 for all datasets, and the hyperparameter λ {0.01, 0.1, 1, 10, 100}.