Transformers Meet Directed Graphs

Authors: Simon Geisler, Yujia Li, Daniel J Mankowitz, Ali Taylan Cemgil, Stephan Günnemann, Cosmin Paduraru

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that the extra directionality information is useful in various downstream tasks, including correctness testing of sorting networks and source code understanding. Together with a data-flow-centric graph construction, our model outperforms the prior state of the art on the Open Graph Benchmark Code2 relatively by 14.7%.
Researcher Affiliation Collaboration 1Dept. of Computer Science & Munich Data Science Institute, Technical University of Munich 2Google Deep Mind.
Pseudocode Yes Algorithm D.1 Normalize Eigenvectors; Algorithm F.1 Magnetic Laplacian Positional Encodings; Algorithm K.1 Generate Sorting Network.
Open Source Code Yes Code and configuration: www.cs.cit.tum.de/daml/digraph-transformer
Open Datasets Yes We set a new state of the art on the OGB Code2 dataset (2.85% higher F1 score, 14.7% relatively) for function name prediction ( 7).
Dataset Splits Yes For the regression tasks, we sample graphs with 16 to 63, 64 to 71, and 72 to 83 nodes for train, validation, and test, respectively. ... We construct a dataset consisting of 800,000 training instances for equally probable sequence lengths 7 ptrain 11, generate the validation data with pval = 12, and assess performance on sequence lengths 13 ptest 16.
Hardware Specification Yes For the playground classification tasks 5, we train on one Nvidia Ge Force GTX 1080TI with 11 GB RAM. Regression as well as sorting network results are obtained with a V100 with 40 GB RAM. For training the models on function name prediction dataset, we used four Google Cloud TPUv4 (behaves like 8 distributed devices).
Software Dependencies No The paper mentions using JAX for experiments and various optimizers/techniques like Adam W, adaptive gradient clipping, and cosine annealing, but it does not specify version numbers for JAX or any other software libraries or dependencies.
Experiment Setup Yes We choose the hyperparameters for each model based on a random search over the important learning parameters like learning rate, weight decay, and the parameters of Adam W... We list the important hyperparameters in Table G.1.