Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
Authors: Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The resulting architecture, Cluster-wise Graph Transformer (Cluster-GT), which uses node clusters as tokens and employs our proposed N2C-Attn module, shows superior performance on various graph-level tasks. Code is available at https: //github.com/LUMIA-Group/Cluster-wise-Graph-Transformer. To evaluate the performance of Cluster-GT, we compare it against two categories of methods: Graph Pooling and Graph Transformers. We conduct experiments on eight graph classification datasets from different domains, including social networks and biology. |
| Researcher Affiliation | Academia | Siyuan Huang1,2 Yunchong Song1 Jiayue Zhou2 Zhouhan Lin1 1LUMIA Lab, Shanghai Jiao Tong University 2Paris Elite Institute of Technology, Shanghai Jiao Tong University |
| Pseudocode | No | The paper describes procedures and mathematical formulations but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Code is available at https: //github.com/LUMIA-Group/Cluster-wise-Graph-Transformer. |
| Open Datasets | Yes | We conduct experiments on eight graph classification datasets from different domains, including social networks and biology. (...) Table 3: Summary statistics of datasets (IMDB-BINARY, IMDB-MULTI, COLLAB, MUTAG, PROTEINS, D&D, ZINC, Mol HIV) |
| Dataset Splits | Yes | Moreover, 10 percent of the training data is allocated as validation data to ensure a fair comparison, as per [10]. We utilize a standard train/validation/test dataset split following [18]. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA RTX 3090s with 24GB of RAM. |
| Software Dependencies | No | The model is implemented using Py Torch and Py G [11]. |
| Experiment Setup | Yes | For optimization, the Adam [26] optimizer is utilized, adhering to the default settings of β1 = 0.9, β2 = 0.999, and ε = 1e 8. An early stopping criterion is implemented, halting training if there is no improvement in validation loss over 50 epochs. The training process is capped at a maximum of 500 epochs. We use a batch size of 64. |