Graph Diffusion Policy Optimization

Authors: Yijing Liu, Chao Du, Tianyu Pang, Chongxuan LI, Min Lin, Wei Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that GDPO achieves state-of-the-art performance in various graph generation tasks with complex and diverse objectives. Code is available at https://github.com/sail-sg/GDPO. ... Our paper does not propose any theoretical results.
Researcher Affiliation Collaboration Yijing Liu 1, Chao Du 2, Tianyu Pang2, Chongxuan Li3, Min Lin2, Wei Chen 1 1State Key Lab of CAD&CG, Zhejiang University 2Sea AI Lab, Singapore 3Renmin University of China
Pseudocode Yes Algorithm 1: Graph Diffusion Policy Optimization
Open Source Code Yes Code is available at https://github.com/sail-sg/GDPO. ... In the supplementary materials, we provide the code, dataset, and instructions for reproduction.
Open Datasets Yes We evaluate GDPO on two benchmark datasets: SBM (200 nodes) and Planar (64 nodes), each consisting of 200 graphs. ... ZINC250k [27] and MOSES [49]. ... In the supplementary materials, we provide the code, dataset, and instructions for reproduction.
Dataset Splits Yes We set T = 1000, |T | = 200, and N = 100. The number of trajectory samples K is 64 for SBM and 256 for Planar. ... During fine-tuning, we keep all layers fixed except for attention, set the learning rate to 0.00001, and utilize gradient clipping to limit the gradient norm to be less than or equal to 1. ... All statistical results are obtained by repeating the experiment five times, and the corresponding standard deviations are provided.
Hardware Specification Yes We conducted all experiments on a single A100 GPU with 40GB of VRAM and an AMD EPYC 7352 24-core Processor.
Software Dependencies No QED and SA scores are computed using the RDKit library. However, no specific version number for RDKit or any other software dependency is provided, which is required for reproducibility.
Experiment Setup Yes We set T = 1000, |T | = 200, and N = 100. The number of trajectory samples K is 64 for SBM and 256 for Planar. ... set the learning rate to 0.00001, and utilize gradient clipping to limit the gradient norm to be less than or equal to 1.