Graph Neural Networks Need Cluster-Normalize-Activate Modules

Authors: Arseny Skryagin, Felix Divo, Mohammad Amin Ali, Devendra S Dhami, Kristian Kersting

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate in node classification and property prediction tasks that CNA significantly improves the accuracy over the state-of-the-art. Particularly, CNA reaches 94.18% and 95.75% accuracy on Cora and Cite Seer, respectively. It further benefits GNNs in regression tasks as well, reducing the mean squared error compared to all baselines. To evaluate the effectiveness of CNA with GNNs, we aim to answer the following research questions: (Q1) Does CNA limit oversmoothing? (Q2) Does CNA improve the performance in node classification, node regression, and graph classification tasks? (Q3) Can CNA allow for having fewer parameters while maintaining strong performance when scaling to very large graphs? (Q4) Model Analysis: How important are each of the three steps in CNA? How do hyperparameters affect the results?
Researcher Affiliation Academia Arseny Skryagin1 Felix Divo1 Mohammad Amin Ali1 Devendra Singh Dhami2 Kristian Kersting1,3,4,5 1AI & ML Group, TU Darmstadt 2TU Eindhoven 3Hessian Center for AI (hessian.AI) 4German Research Center for AI (DFKI) 5Centre for Cognitive Science, TU Darmstadt {arseny.skryagin,felix.divo,kersting}@cs.tu-darmstadt.de amin.ali@stud.tu-darmstadt.de d.s.dhami@tue.nl
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code available at https://github.com/ml-research/cna_modules.
Open Datasets Yes Specifically, we evaluate the performance on the following datasets: Cora, Cora Full [Kipf and Welling, 2016], Cite Seer [Bojchevski and Günnemann, 2018], Pub Med [Sen et al., 2008], DBLP [Tang et al., 2008], Computers and Photo [Shchur et al., 2019], Chameleon, Squirrel, Texas, and Wisconsin [Pei et al., 2020]. The results in Table 4 demonstrate the effectiveness of CNA. Out of 11 of those datasets, CNA outperforms the SOTA on 8 of them.
Dataset Splits Yes We matched the datasets and train/val/test splits as close to original publications as possible to maintain comparability to the existing literature and reproducibility.
Hardware Specification Yes Regardless of the setting, each experiment was performed on one A100 Nvidia GPU and took between five minutes and two hours, depending on the specific configuration.
Software Dependencies No The paper mentions software packages like Py Torch Geometric (Py G), Fast Py Torch Kmeans, and Activation Functions library, but does not specify their version numbers to allow for a reproducible description of ancillary software.
Experiment Setup Yes Details on the choice of hyperparameters and training settings are provided in Appendix A.2. Average performances and standard deviations are over 5 seeds used for model initialization for all results, except for Tables 1 and 6, where we used 20. Table 9 lists all relevant hyperparameters used for node classification, property prediction, and graph-level classification tasks. Table 10 provides the hyperparameters for Table 1, for the node regression task, as well as for the ablation study. For all experiments, we used the Adam optimizer with weight decay, where we set β1 = 0.9 and β2 = 0.999.