Even Sparser Graph Transformers

Authors: Hamed Shirzad, Honghao Lin, Balaji Venkatachalam, Ameya Velingker, David Woodruff, Danica J. Sutherland

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show (empirically and with theoretical backing) that attention scores on graphs are usually quite consistent across network widths, and use this observation to propose a two-stage procedure, which we call Spexphormer: first, train a narrow network on the full augmented graph. Next, use only the active connections to train a wider network on a much sparser graph. We establish theoretical conditions when a narrow network s attention scores can match those of a wide network, and show that Spexphormer achieves good performance with drastically reduced memory requirements on various graph datasets. Code can be found at https://github.com/hamed1375/Sp_Exphormer. Section 5 is titled "Experimental Results".
Researcher Affiliation Collaboration Hamed Shirzad University of British Columbia shirzad@cs.ubc.ca Honghao Lin Carnegie Mellon University honghaol@andrew.cmu.edu Balaji Venkatachalam Meta bave@meta.com Ameya Velingker Independent Researcher ameyav@gmail.com David P. Woodruff CMU & Google Research dwoodruf@cs.cmu.edu Danica J. Sutherland UBC & Amii dsuth@cs.ubc.ca
Pseudocode Yes Algorithm 1 Reservoir Sampling from a Node s Neighborhood and Algorithm 2 Neighborhood Sampling for a Batch of Nodes are provided.
Open Source Code Yes Code can be found at https://github.com/hamed1375/Sp_Exphormer.
Open Datasets Yes To this end, we use Actor (Lim et al., 2021) and Photo (Shchur et al., 2018) datasets. For ogbn-arxiv we follow the standard data split provided by the original source. We then experiment on large graph datasets (ogbn-proteins, Amazon2M (Hu et al., 2021) and Pokec (Takac and Zabovsky, 2012)).
Dataset Splits Yes For the CS, Physics, Photo, and Computer datasets, we use a random train/validation/test split of 60%/20%/20%. For ogbn-arxiv we follow the standard data split provided by the original source. For the Actor dataset, we use a 50%/25%/25% split following Wu et al. (2022). For the Minesweeper and Tolokers datasets, we use the standard split from Platonov et al. (2023). We follow the standard data split for the ogbn-proteins dataset and follow Wu et al. (2024) for the dataset split on the Amazon2M and Pokec datasets, with 10%/10%/80% and 50%/25%/25% train/validation/test ratios.
Hardware Specification Yes For all trainings of the medium-sized graph datasets and the final network training of the large-sized graphs, we used GPUs of type A100 with 40GB memory, and V100, both 32GB and 16GB versions. For calculating the attention scores on the large graph datasets, we have used CPU devices Intel Xeon E5-2680 v4, with 500GB of memory.
Software Dependencies No The paper mentions using PyTorch and Tensorflow, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Other key hyperparameters can be found in Tables 6,7, and 8. These tables provide specific values for 'L', 'ds', 'Num Epochs', 'Learning Rate', 'dl', 'degâ„“', 'Number of Heads', 'Dropout', and 'Batch size' for various datasets.