SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations
Authors: Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian Jiang, Yatao Bian, Junchi Yan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, SGFormer successfully scales to the web-scale graph ogbn-papers100M and yields up to 141x inference acceleration over SOTA Transformers on medium-sized graphs. Experiments show that SGFormer achieves highly competitive performance in an extensive range of node property prediction datasets, which are used as common benchmarks for model evaluation w.r.t. the fundamental challenge of representation learning on graphs, when compared to powerful GNNs and state-of-the-art graph Transformers. |
| Researcher Affiliation | Collaboration | Qitian Wu1, Wentao Zhao1, Chenxiao Yang1, Hengrui Zhang2, Fan Nie1, Haitian Jiang3, Yatao Bian4, Junchi Yan1 1 Dept. of Computer Science and Engineering & Mo E Key Lab of AI, Shanghai Jiao Tong University 2 Dept. of Computer Science, University of Illinois at Chicago 3 Courant Institute, New York University 4 Tencent AI Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes the model components and mathematical equations but not in an algorithm format. |
| Open Source Code | Yes | The codes are publicly available at https://github.com/qitianwu/SGFormer. |
| Open Datasets | Yes | Our experiments are based on 12 real-world datasets which are all publicly available with open access. The information of these datasets is presented in Table 5. ... For citation networks, we follow the semi-supervised setting of [23] for data splits. ... We use the public splits in OGB [19]. ... We follow the splits used in the recent work [57] for Amazon2M and use random splits with the ratio 1:1:8 for pokec. ... We also follow the public OGB splits. |
| Dataset Splits | Yes | For citation networks, we follow the semi-supervised setting of [23] for data splits. ... using randomly sampled 20 instances per class as training set, 500 instances as validation set, and 1,000 instances as testing set. ... For actor and deezer-europe, we use the random splits of the benchmarking setting in [32]. ... For squirrel and chameleon, we use the splits proposed by a recent evaluation paper [38] that filters the overlapped nodes in the original datasets. ... We use the public splits in OGB [19]. ... We follow the recent work [57] using random split with the ratio 50%/25%/25%. ... For data split, we randomly divide the nodes into training, validation, and testing sets, with ratios of 10%, 10%, and 80%, respectively. ... We also follow the public OGB splits. |
| Hardware Specification | Yes | Table 4 reports the training time per epoch, inference time and GPU memory (GB) costs on a Tesla T4. |
| Software Dependencies | No | The paper mentions various models and frameworks (e.g., MLP, GNN, GCN, GAT, SGC, APPNP, JKNet, CPGNN, SIGN, Glo GNN, Graphormer, Graph Trans, Node Former, WORD2VEC) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We use the model performance on the validation set for hyper-parameter settings of all models including the competitors. Unless otherwise stated, the hyper-parameters are selected using grid search with the searching space: learning rate within {0.001, 0.005, 0.01, 0.05, 0.1}; weight decay within {1e 5, 1e 4, 5e 4, 1e 3, 1e 2}; hidden size within {32, 64, 128, 256}; dropout ratio within {0, 0.2, 0.3, 0.5}; number of layers within {1, 2, 3}. ... We follow the common practice for node classification tasks and set a fixed number of training epochs: 300 for medium-sized graphs, 1000 for large-sized graphs, and 50 for the extremely large graph ogbn-papers100M. |