reproducibilityindex.ai

Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness

Authors: Bohang Zhang, Jingchu Gai, Yiheng Du, Qiwei Ye, Di He, Liwei Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, extensive experiments on both synthetic and real-world tasks verify our theory, showing that the practical performance of GNN models aligns well with the proposed metric. This section aims to verify our theory through a comprehensive set of experiments.
Researcher Affiliation	Academia	1National Key Laboratory of General Artificial Intelligence, SIST, Peking University 2School of Mathematical Science, Peking University 3Yuanpei College, Peking University 4Beijing Academy of Artificial Intelligence 5Center for Machine Learning Research, Peking University
Pseudocode	No	The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks. It describes algorithms textually and with mathematical formulas.
Open Source Code	Yes	Our code is available at https://github.com/subgraph23/homomorphism-expressivity.
Open Datasets	Yes	We use the benchmark dataset from Zhao et al. (2022a) and comprehensively test the homomorphism expressivity at graph/node/edge-level by carefully selecting 8 substructures shown in Table 1. ZINC (Dwivedi et al., 2020) is a standard real-world dataset for benchmarking molecular property prediction. Alchemy (Chen et al., 2019a) is another real-world dataset with 12 graph-level quantum mechanical properties.
Dataset Splits	Yes	The initial learning rate is chosen as 0.001 and is decayed by a factor of 0.9 once the MAE on the validation set plateaus for 10 epochs. Each model is trained for 400 epochs on ZINC-subset and 500 epochs on ZINC-full, both with a batch size of 128. We report the MAE for the model checkpoint with the best validation performance. We follow the sampling and training protocol from Lim et al. (2023); Puny et al. (2023), using 100K samples for training, 10K samples for testing, and 10K samples for validation.
Hardware Specification	Yes	All experiments are run on a single NVIDIA Tesla V100 GPU.
Software Dependencies	Yes	All models are implemented using the Py Torch (Paszke et al., 2019) framework and the Py Torch Geometric library (Fey & Lenssen, 2019).
Experiment Setup	Yes	To ensure fairness, we employ the same GIN-based design (Xu et al., 2019) for all models and control their model sizes and training budgets to be roughly the same on each task. All models are trained using the Adam optimizer. For all tasks, we use the distance encoding hyper-parameter max dis = 5. We use a model depth of L = 5 in all experiments. The initial learning rate is chosen as 0.001 and is decayed by a factor of 0.9 once the MAE on the validation set plateaus for 10 epochs. Each model is trained for 1200 epochs with a batch size of 512.