Representing Long-Range Context for Graph Neural Networks with Global Attention

Authors: Zhanghao Wu, Paras Jain, Matthew Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stoica

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our simple architecture leads to state-of-the-art results on several graph classification tasks, outperforming methods that explicitly encode graph structure. 5 Experiments
Researcher Affiliation Collaboration 1UC Berkeley, 2Google Brain
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code for Graph Trans is available at https://github.com/ucbrise/graphtrans.
Open Datasets Yes We choose two commonly used graph classification benchmarks, NCI1 and NCI109 [31]. For chemical benchmarks, we evaluate our Graph Trans on a dataset larger than NCI dataset, molpcba from the Open Graph Benchmark (OGB) [15]. For the computer programming benchmark, we also adopt a large dataset, code2 from OGB...
Dataset Splits Yes We follow the settings in [20, 2] for the NCI1 and NCI109, randomly splitting the dataset into training, validation, and test set by a ratio of 8:1:1.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions the Adam optimizer and common model architectures but does not specify version numbers for key software components or libraries like Python, PyTorch, or CUDA.
Experiment Setup Yes All of our models are trained with the Adam optimizer [17] with a learning rate of 0.0001, a weight decay of 0.0001, and the default Adam β parameters. All Transformer modules used in our experiments have an embedding dimension d TF of 128 and a hidden dimension of 512 in the feedforward subnetwork. We trained Graph Trans on both the NCI1 and NCI109 datasets for 100 epochs with a batch size of 256. For both GNN and Transformer modules, we apply a dropout of 0.3. We train all the models for 30 epochs with a batch size of 16...