Relational Attention: Generalizing Transformers for Graph-Structured Tasks
Authors: Cameron Diao, Ricky Loynd
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate this relational transformer on a diverse array of graph-structured tasks, including the large and challenging CLRS Algorithmic Reasoning Benchmark. Our analysis demonstrates that these gains are attributable to relational attention s inherent ability to leverage the greater expressivity of graphs over sets. We evaluate RT against common GNNs on the diverse set of graph-structured tasks provided by CLRS-30 (Veliˇckovi c et al., 2022). |
| Researcher Affiliation | Collaboration | Cameron Diao Department of Computer Science Rice University cwd2@rice.edu Ricky Loynd Microsoft Research riloynd@microsoft.com |
| Pseudocode | No | The paper describes mathematical equations and a process, but does not include a block labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | We introduce the relational transformer for application to arbitrary graph-structured tasks, and make the implementation available at https://github.com/Cameron Diao/ relational-transformer. |
| Open Datasets | Yes | We evaluate RT against common GNNs on the diverse set of graph-structured tasks provided by CLRS-30 (Veliˇckovi c et al., 2022). CLRS-30 provides canonical datasets (training, validation, and test) which can also be generated from specific random seeds: 1, 2, 3. |
| Dataset Splits | Yes | CLRS-30 provides canonical datasets (training, validation, and test) which can also be generated from specific random seeds: 1, 2, 3. The graphs in the training and validation datasets contain 16 nodes, while the test graphs are of size 64 to evaluate the out-of-distribution (OOD) generalization of models. During training, the model is evaluated on the validation set after every 320 examples. |
| Hardware Specification | Yes | Training speed in examples per second on a T4 GPU, on the reference algorithm Bellman Ford. |
| Software Dependencies | No | The paper mentions that 'the CLRS-30 framework is written in Jax' but does not specify a version number for Jax or any other software dependency. |
| Experiment Setup | Yes | To tune the hyperparameters of RT and the CLRS-30 baseline GNNs, we used Distributed Grid Descent (DGD) (Loynd et al., 2020), a self-guided form of random search. Table 2 lists the tuned hyperparameter values for CLRS-30 experiments, and Table 3 reports the sets of values considered in those searches. |