Transformers over Directed Acyclic Graphs

Authors: Yuankai Luo, Veronika Thost, Lei Shi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously evaluate our proposal in ablation studies and show that it successfully improves different kinds of baseline transformers, from vanilla transformers [Vaswani et al., 2017] to stateof-the-art graph transformers [Wu et al., 2021, Chen et al., 2022a, Rampášek et al., 2022, Wu et al., 2022], over various types of DAG data. Our experiments range from classifying source code graphs to nodes in citation networks, and even go far beyond related works problem scope.
Researcher Affiliation Collaboration Yuankai Luo Beihang University luoyk@buaa.edu.cn Veronika Thost MIT-IBM Watson AI Lab, IBM Research veronika.thost@ibm.com Lei Shi Beihang University leishi@buaa.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks with clear labels like "Pseudocode" or "Algorithm".
Open Source Code Yes Our implementation is available at https://github.com/LUOyk1999/DAGformer.
Open Datasets Yes ogbg-code2 [Hu et al., 2020]. A large dataset of ASTs derived from Python methods... NA [Zhang et al., 2019]. A dataset with much smaller graphs, containing neural architecture DAGs generated by the ENAS software... Self-citation [ARC, 2021, Luo et al., 2023]. Each DAG in the academic self-citation represents a scholar s academic self-citation network [ARC, 2021]... Cora, Citeseer, Pubmed [Sen et al., 2008]. Established, medium-sized citation graphs.
Dataset Splits Yes We use the standard evaluation metric (F1 score) and splits provided by [Hu et al., 2020]. ... We use the splits as is used in [Thost and Chen, 2021]. ... We randomly split the dataset into 80% training, 10% valid and 10% test sets. ... We use the same evaluation metric (classification accuracy) and splits provided by Node Former [Wu et al., 2022].
Hardware Specification Yes The experiments are conducted with two RTX 3090 GPUs.
Software Dependencies No The paper states "Our implementation is based on Py G [Fey and Lenssen, 2019]", but it does not provide specific version numbers for PyG or other key software dependencies (e.g., Python, PyTorch).
Experiment Setup Yes Table 9: Hyperparameter search on different datasets. Hyperparameter NA Self-citation Cora, Citeseer and Pubmed # Layers {2,3} {2, 3, 4, 5} {2, 3, 4, 5} Hidden dimensions {32,128} {32, 64, 128} {16, 32, 64, 128} Dropout 0.2 {0.1, 0.2, 0.5} 0.0 Learning rate 1e-3 {1e-4, 5-e4, 1e-3, 5e-3} {1e-3, 1e-2} # Epochs 100 {50, 100} 1000 Weight decay None 1e-6 5e-3 # Attention heads {2, 4} {2, 4, 8} {2, 4}