Even Sparser Graph Transformers
Authors: Hamed Shirzad, Honghao Lin, Balaji Venkatachalam, Ameya Velingker, David Woodruff, Danica J. Sutherland
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show (empirically and with theoretical backing) that attention scores on graphs are usually quite consistent across network widths, and use this observation to propose a two-stage procedure, which we call Spexphormer: first, train a narrow network on the full augmented graph. Next, use only the active connections to train a wider network on a much sparser graph. We establish theoretical conditions when a narrow network s attention scores can match those of a wide network, and show that Spexphormer achieves good performance with drastically reduced memory requirements on various graph datasets. Code can be found at https://github.com/hamed1375/Sp_Exphormer. Section 5 is titled "Experimental Results". |
| Researcher Affiliation | Collaboration | Hamed Shirzad University of British Columbia shirzad@cs.ubc.ca Honghao Lin Carnegie Mellon University honghaol@andrew.cmu.edu Balaji Venkatachalam Meta bave@meta.com Ameya Velingker Independent Researcher ameyav@gmail.com David P. Woodruff CMU & Google Research dwoodruf@cs.cmu.edu Danica J. Sutherland UBC & Amii dsuth@cs.ubc.ca |
| Pseudocode | Yes | Algorithm 1 Reservoir Sampling from a Node s Neighborhood and Algorithm 2 Neighborhood Sampling for a Batch of Nodes are provided. |
| Open Source Code | Yes | Code can be found at https://github.com/hamed1375/Sp_Exphormer. |
| Open Datasets | Yes | To this end, we use Actor (Lim et al., 2021) and Photo (Shchur et al., 2018) datasets. For ogbn-arxiv we follow the standard data split provided by the original source. We then experiment on large graph datasets (ogbn-proteins, Amazon2M (Hu et al., 2021) and Pokec (Takac and Zabovsky, 2012)). |
| Dataset Splits | Yes | For the CS, Physics, Photo, and Computer datasets, we use a random train/validation/test split of 60%/20%/20%. For ogbn-arxiv we follow the standard data split provided by the original source. For the Actor dataset, we use a 50%/25%/25% split following Wu et al. (2022). For the Minesweeper and Tolokers datasets, we use the standard split from Platonov et al. (2023). We follow the standard data split for the ogbn-proteins dataset and follow Wu et al. (2024) for the dataset split on the Amazon2M and Pokec datasets, with 10%/10%/80% and 50%/25%/25% train/validation/test ratios. |
| Hardware Specification | Yes | For all trainings of the medium-sized graph datasets and the final network training of the large-sized graphs, we used GPUs of type A100 with 40GB memory, and V100, both 32GB and 16GB versions. For calculating the attention scores on the large graph datasets, we have used CPU devices Intel Xeon E5-2680 v4, with 500GB of memory. |
| Software Dependencies | No | The paper mentions using PyTorch and Tensorflow, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Other key hyperparameters can be found in Tables 6,7, and 8. These tables provide specific values for 'L', 'ds', 'Num Epochs', 'Learning Rate', 'dl', 'degâ„“', 'Number of Heads', 'Dropout', and 'Batch size' for various datasets. |