Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pure Transformers are Powerful Graph Learners
Authors: Jinwoo Kim, Dat Nguyen, Seonwoo Min, Sungjun Cho, Moontae Lee, Honglak Lee, Seunghoon Hong
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first conduct a synthetic experiment that directly confirms our key claims in Lemma 1 (Section 3). Then, we empirically explore the capability of Tokenized Graph Transformer (Token GT) (Section 2) using the PCQM4Mv2 large-scale quantum chemistry regression dataset [27]. In an experiment with PCQM4Mv2 large-scale dataset, we show that Tokenized Graph Transformer (Token GT) performs significantly better than all GNNs and is competitive with Transformer variants with strong graph-specific architectural components [78, 29, 54]. |
| Researcher Affiliation | Collaboration | 1KAIST 2LG AI Research 3University of Illinois Chicago |
| Pseudocode | No | The paper describes the architecture and components of Token GT, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation is available at https://github.com/jw9730/tokengt. |
| Open Datasets | Yes | We test our model, named Tokenized Graph Transformer (Token GT), mainly on the PCQM4Mv2 large-scale quantum chemical property prediction dataset containing 3.7M molecular graphs [27]. |
| Dataset Splits | Yes | We report the Mean Absolute Error (MAE) on the validation set, and report MAE on the unavailable test set if possible. For fine-tuning, we use 1k warmup, 0.1M training steps, and cosine learning rate decay. |
| Hardware Specification | Yes | We train the models on 8 RTX 3090 GPUs for 3 days. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' but does not specify version numbers for any key software components like deep learning frameworks (e.g., PyTorch, TensorFlow), Python, or CUDA. |
| Experiment Setup | Yes | For Token GT, we use both node and type identifiers, and use main Transformer encoder configuration based on Graphormer [78] with 12 layers, 768 hidden dimension, and 32 attention heads. We use Adam W optimizer with (β1, β2) = (0.99, 0.999) and weight decay 0.1, and 60k learning rate warmup steps followed by linear decay over 1M iteration with batch size 1024. For fine-tuning, we use 1k warmup, 0.1M training steps, and cosine learning rate decay. |