Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Flatten Graphs as Sequences: Transformers are Scalable Graph Generators
Authors: Dexiong Chen, Markus Krimmel, Karsten Borgwardt
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performance of AUTOGRAPH on several graph generation benchmarks, including both small and large graphs, and synthetic and real-world molecular datasets. Our experiments compare its performance to several SOTA methods and particularly focus on evaluating the following aspects: (1) We show its ability to generate relatively small graphs with a 100-fold inference speedup compared to diffusion-based models while maintaining or even improving structural validity. |
| Researcher Affiliation | Academia | Dexiong Chen Max Planck Institute of Biochemistry Martinsried, Germany EMAIL |
| Pseudocode | Yes | Algorithm 1 Causal and Hamiltonian SENT Sampling |
| Open Source Code | Yes | Our code is available at https://github.com/Borgwardt Lab/Auto Graph. |
| Open Datasets | Yes | Small synthetic graphs: Planar and SBM. Both of these datasets are from Martinkus et al. [45]. ... Large graphs: Proteins and Point Clouds. The Proteins dataset includes graph representations (contact maps) of proteins from Dobson and Doig [20]. ... QM9. The QM9 dataset, from Wu et al. [68]. ... MOSES and Guaca Mol. The MOSES and Guaca Mol datasets are obtained from the respective benchmark tools of Polykovskiy et al. [53] and Brown et al. [6]. ... Pub Chem-10M. Pub Chem-10M is a subset of about 10M molecules from Pub Chem curated by Chithrananda et al. [14]. |
| Dataset Splits | Yes | We adopt the standard train/validation/test splits provided in the original sources. The statistics about the datasets are summarized in Table 8. |
| Hardware Specification | Yes | Experiments were conducted on a shared computing cluster with various CPU and GPU configurations, including 16 NVIDIA H100 (80GB) GPUs. Each experiment was allocated resources on a single GPU, along with 8 CPUs and up to 48GB of system RAM. The run-time of each model was measured on a single NVIDIA H100 GPU. |
| Software Dependencies | No | Our implementation leverages the Hugging Face framework [31], providing users with a flexible interface to experiment with SOTA language models for graph generation. ... We employ the Adam W optimizer with a gradient clipping threshold of 1.0, a weight decay of 0.1, and a learning rate schedule with a linear warmup followed by cosine decay, peaking at 6e-4. |
| Experiment Setup | Yes | We maintain a consistent model architecture and size throughout all experiments, specifically using the small GPT configuration (768 hidden dimensions, 12 layers, 12 attention heads). ... We fix the context length to 2048 and use a batch size of 128 if possible, otherwise 64 for larger graphs. In particular, we employ the Adam W optimizer with a gradient clipping threshold of 1.0, a weight decay of 0.1, and a learning rate schedule with a linear warmup followed by cosine decay, peaking at 6e-4. The Adam W hyperparameters are set to β = (0.9, 0.95). ... Each model was trained for 200000, 400000, or 800000 iterations, depending on the dataset size. |