Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Expressivity and Sample Complexity of Node-Individualized Graph Neural Networks

Authors: Paolo Pellizzoni, Till Hendrik Schulz, Dexiong Chen, Karsten Borgwardt

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, our theoretical findings are validated experimentally on both synthetic and real-world datasets.
Researcher Affiliation Academia Paolo Pellizzoni Max Planck Institute of Biochemistry Martinsried, Germany EMAIL Till Hendrik Schulz Max Planck Institute of Biochemistry Martinsried, Germany EMAIL Dexiong Chen Max Planck Institute of Biochemistry Martinsried, Germany EMAIL Karsten Borgwardt Max Planck Institute of Biochemistry Martinsried, Germany EMAIL
Pseudocode Yes The Tinhofer algorithm [59, 4] returns an ordering of the nodes of a graph. In particular, it works as follows. 1. Run color refinement on G and obtain the stable color partition P(G). 2. Given the partition P(G) If all nodes belong to a singleton color class, return the ordering of the nodes based on the lexicographic order of their colors. Else, pick the color class with at least two nodes with the lexicographically smallest color. Individualize one arbitrary node in such class by assigning it the smallest unused color. Then, go to step 1.
Open Source Code Yes Code and datasets are available at https://github.com/Borgwardt Lab/Node Individualized GNNs.
Open Datasets Yes Real-world datasets (i.e., NCI1, IMDB-b, MCF-7, Mutagenicity, COLLAB-b, and Peptides-func) were provided by [31, 38] and [18].
Dataset Splits No Note that since the focus of this paper is on the (worst-case) generalization gap, the best epoch is not chosen using a validation dataset, as it should be done in practice.
Hardware Specification Yes The experiments are run on a cluster equipped with Intel(R) Xeon(R) Silver 4116 CPUs and NVIDIA H100 GPUs.
Software Dependencies No The code is based on Py Torch and Py Torch-Geometric.
Experiment Setup Yes We fixed the embedding dimension to 256 and use an Adam optimizer with a learning rate of 0.0001.