Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Diffusion-Guided Graph Data Augmentation

Authors: Maria Marrium, Arif Mahmood, Muhammad Haris Khan, M. Shakeel, Wenxiong Kang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on 12 benchmark datasets for node classification, link prediction, and graph classification, D-GDA has shown excellent performance compared to 30 state-of-the-art methods.
Researcher Affiliation Academia 1Information Technology Univeristy, Lahore, Pakistan 2MBZUAI, Abu Dhabi, UAE 3South China University of Technology, Guangdong, China EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the D-GDA framework methodology in Section 3 and its components (Target Sample Selector, Graph Variational Autoencoder, Latent Diffusion Model) in detail, but it does not include a dedicated pseudocode or algorithm block.
Open Source Code Yes Code is available at https://github.com/Maria Marrium/D-GDA.
Open Datasets Yes We evaluate our proposed method on four small-scale datasets: Cora, Citeseer, Pubmed [56], and Flickr [46] and two large-scale datasets: Ogbn-Arxiv [22], and Ogbn-Products [22]... We evaluate D-GDA on five link prediction benchmarks: ogbl-collab [70] ogbl-ddi [75], Cora, Citeseer, and Pubmed [56]... We evaluate our proposed method on four molecular property prediction datasets: ogbg-mol SIDER, ogbg-Clin Tox, ogbg-mol HIV, and ogbg-mol BACE [78].
Dataset Splits Yes Table 17: Summary statistics of node classification evaluation datasets (e.g., Cora: 140/500/1,000 [29])... Table 18: Summary Statistics of link prediction evaluation datasets (e.g., Ogbl-collab: 92/04/04 [22])... Table 19: Summary Statistics of graph classification evaluation datasets (e.g., Ogbg-mol SIDER: 80/10/10 [22]).
Hardware Specification Yes Training and inference time are reported using a 4x A16 GPU machine with 128GB RAM.
Software Dependencies No The paper describes the use of a GCN-based encoder, Adam optimizer, and 1D-UNet, but does not provide specific version numbers for software libraries or dependencies like PyTorch, TensorFlow, Python, or CUDA.
Experiment Setup Yes For TSS, we train a baseline 2-layer Graph Convolutional Network (GCN) with a hidden dimension of 32, trained using the Adam optimizer with a learning rate of 0.001 for 500 epochs. Early stopping is applied with a patience of 20 epochs... The GVAE consists of a 2-layer GCN encoder... and two 2-layer Multi-Layer Perceptrons (MLPs)... We set the latent dimension to 64... optimized using a composite loss function: LGVAE = Ledge + λ1Lfeat + λ2LKL, where λ1 = 0.3 weights the feature reconstruction loss and λ2 = 0.01 controls the KL-divergence term for regularization... trained for 1000 epochs with the Adam optimizer, using a learning rate of 0.01 and a weight decay of 5 × 10−5. Following [34, 47], we apply edge masking at a rate of 0.3 and feature masking at a rate of 0.5... Finally, we train a Latent Diffusion Model (LDM) with a timestep T = 1000 and a hidden dimension of 64. The LDM is optimized using the Adam optimizer with a learning rate of 1 × 10−4 for 1000 epochs...