Transformers over Directed Acyclic Graphs
Authors: Yuankai Luo, Veronika Thost, Lei Shi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously evaluate our proposal in ablation studies and show that it successfully improves different kinds of baseline transformers, from vanilla transformers [Vaswani et al., 2017] to stateof-the-art graph transformers [Wu et al., 2021, Chen et al., 2022a, Rampášek et al., 2022, Wu et al., 2022], over various types of DAG data. Our experiments range from classifying source code graphs to nodes in citation networks, and even go far beyond related works problem scope. |
| Researcher Affiliation | Collaboration | Yuankai Luo Beihang University luoyk@buaa.edu.cn Veronika Thost MIT-IBM Watson AI Lab, IBM Research veronika.thost@ibm.com Lei Shi Beihang University leishi@buaa.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks with clear labels like "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | Our implementation is available at https://github.com/LUOyk1999/DAGformer. |
| Open Datasets | Yes | ogbg-code2 [Hu et al., 2020]. A large dataset of ASTs derived from Python methods... NA [Zhang et al., 2019]. A dataset with much smaller graphs, containing neural architecture DAGs generated by the ENAS software... Self-citation [ARC, 2021, Luo et al., 2023]. Each DAG in the academic self-citation represents a scholar s academic self-citation network [ARC, 2021]... Cora, Citeseer, Pubmed [Sen et al., 2008]. Established, medium-sized citation graphs. |
| Dataset Splits | Yes | We use the standard evaluation metric (F1 score) and splits provided by [Hu et al., 2020]. ... We use the splits as is used in [Thost and Chen, 2021]. ... We randomly split the dataset into 80% training, 10% valid and 10% test sets. ... We use the same evaluation metric (classification accuracy) and splits provided by Node Former [Wu et al., 2022]. |
| Hardware Specification | Yes | The experiments are conducted with two RTX 3090 GPUs. |
| Software Dependencies | No | The paper states "Our implementation is based on Py G [Fey and Lenssen, 2019]", but it does not provide specific version numbers for PyG or other key software dependencies (e.g., Python, PyTorch). |
| Experiment Setup | Yes | Table 9: Hyperparameter search on different datasets. Hyperparameter NA Self-citation Cora, Citeseer and Pubmed # Layers {2,3} {2, 3, 4, 5} {2, 3, 4, 5} Hidden dimensions {32,128} {32, 64, 128} {16, 32, 64, 128} Dropout 0.2 {0.1, 0.2, 0.5} 0.0 Learning rate 1e-3 {1e-4, 5-e4, 1e-3, 5e-3} {1e-3, 1e-2} # Epochs 100 {50, 100} 1000 Weight decay None 1e-6 5e-3 # Attention heads {2, 4} {2, 4, 8} {2, 4} |