Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation

Authors: Lingxiao Zhao, Xueying Ding, Leman Akoglu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental PARD achieves new SOTA performance on many molecular and non-molecular datasets without any extra features, significantly outperforming Di Gress [44]. Thanks to efficient architecture and parallel training, PARD scales to large datasets like MOSES [33] with 1.9M graphs. PARD is open-sourced at https://github.com/Lingxiao Shawn/Pard
Researcher Affiliation Academia Lingxiao Zhao Carnegie Mellon University lingxiaozlx@gmail.com Xueying Ding Carnegie Mellon University xding2@andrew.cmu.edu Leman Akoglu Carnegie Mellon University lakoglu@andrew.cmu.edu
Pseudocode Yes We provide the training and inference algorithms for PARD in Apdx. A.8. Specifically, Algo. 2 is used to train next block s size prediction model; Algo. 3 is used to train the shared diffusion for block conditional probabilities; and Algo. 4 presents the generation steps. ... Algorithm 1 Structural Partial Order ϕ
Open Source Code Yes PARD is open-sourced at https://github.com/Lingxiao Shawn/Pard
Open Datasets Yes We experiment with three different molecular datasets used across the graph generation literature: (1) QM9 [34] (2) ZINC250K [23], and (3) MOSES [33] that contains more than 1.9 million graphs. ... We use five generic graph datasets with various structure and semantic: (1) COMMUNITY-SMALL [48], (2) CAVEMAN [47], (3) CORA [35], (4) BREAST [15], and (5) GRID [48].
Dataset Splits Yes We use a 80%-20% train and test split, and among the train data we split additional 20% as validation.
Hardware Specification Yes We use a single RTX-A6000 GPU for all experiments.
Software Dependencies No We use Pytorch Geometric [14], and we implement our combination of PPGN and Transformer by referencing the code in Maron et al. [30] and Ma et al. [29]. Additionally, we use Pytorch Lightning [12] for training and keeping the code clean. (Software names are mentioned, but specific version numbers are not provided).
Experiment Setup Yes We use Adam optimizer with cosine decay learning rate scheduler to train. For diffusion and blocksize prediction, we also input the embedding of block id and node degree as additional feature... For each block s diffusion model, we set the maximum time steps to 40 without much tuning.