Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention

Authors: Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The resulting architecture, Cluster-wise Graph Transformer (Cluster-GT), which uses node clusters as tokens and employs our proposed N2C-Attn module, shows superior performance on various graph-level tasks. Code is available at https: //github.com/LUMIA-Group/Cluster-wise-Graph-Transformer. To evaluate the performance of Cluster-GT, we compare it against two categories of methods: Graph Pooling and Graph Transformers. We conduct experiments on eight graph classification datasets from different domains, including social networks and biology.
Researcher Affiliation Academia Siyuan Huang1,2 Yunchong Song1 Jiayue Zhou2 Zhouhan Lin1 1LUMIA Lab, Shanghai Jiao Tong University 2Paris Elite Institute of Technology, Shanghai Jiao Tong University
Pseudocode No The paper describes procedures and mathematical formulations but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes Code is available at https: //github.com/LUMIA-Group/Cluster-wise-Graph-Transformer.
Open Datasets Yes We conduct experiments on eight graph classification datasets from different domains, including social networks and biology. (...) Table 3: Summary statistics of datasets (IMDB-BINARY, IMDB-MULTI, COLLAB, MUTAG, PROTEINS, D&D, ZINC, Mol HIV)
Dataset Splits Yes Moreover, 10 percent of the training data is allocated as validation data to ensure a fair comparison, as per [10]. We utilize a standard train/validation/test dataset split following [18].
Hardware Specification Yes All experiments are conducted on NVIDIA RTX 3090s with 24GB of RAM.
Software Dependencies No The model is implemented using Py Torch and Py G [11].
Experiment Setup Yes For optimization, the Adam [26] optimizer is utilized, adhering to the default settings of β1 = 0.9, β2 = 0.999, and ε = 1e 8. An early stopping criterion is implemented, halting training if there is no improvement in validation loss over 50 epochs. The training process is capped at a maximum of 500 epochs. We use a batch size of 64.