reproducibilityindex.ai

Understanding Transformer Reasoning Capabilities via Graph Algorithms

Authors: Clayton Sanford, Bahare Fatemi, Ethan Hall, Anton Tsitsulin, Mehran Kazemi, Jonathan Halcrow, Bryan Perozzi, Vahab Mirrokni

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also support our theoretical analysis with ample empirical evidence using the Graph QA benchmark. These results show that transformers excel at many graph reasoning tasks, even outperforming specialized graph neural networks.
Researcher Affiliation	Collaboration	1Google Research, 2Columbia University,3Google, 4Google Deep Mind
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	While code is not made available, the data can be generated using the Graph QA dataset linked in the appendix. Full training details are made available, which makes the code easily reproducible. The code will be open sourced upon acceptance of the paper.
Open Datasets	Yes	We use the Graph QA benchmark tasks [24] for our experiments. We used the public code of the dataset available at https://github.com/google-research/ google-research/tree/master/graphqa.
Dataset Splits	Yes	There are 1,000 graphs in the original train set, 500 graphs in the dev set, and 500 graphs in the test set. ... Optimal hyperparameters for each task and model were determined by training on the Graph QAT rain dataset and evaluating performance on the Graph QADev dataset.
Hardware Specification	Yes	All experiments were conducted using Google s TPUv3 and TPUv5e accelerators [35].
Software Dependencies	No	We implemented our model in JAX [26] and used Adam W [43, 50] as the optimizer. ... We employed the GLU [73] activation as a non-linearity. The paper does not specify version numbers for JAX, Adam W, or GLU.
Experiment Setup	Yes	We ﬁxed the number of iterations as 1,000,000 and train standard decoder-only transformers with L = 12 layers, m = 768 embedding dimension, H = 12 heads, learning rate 5 10 4, and dropout 0.1.