Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments
Authors: Enjun Du, Xunkai Li, Tian Jin, Zhihan Zhang, Rong-Hua Li, Guoren Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate Graph Master comprehensively, we formulate four research questions: (RQ1): Can Graph Master generate high-quality text-attributed graph data in data-limited environment? (RQ2): Can the graph data synthesized by Graph Master retain the original graph features well? (RQ3): Can Graph Master maintain interpretability well? (RQ4): What is the relative contribution of each component in Graph Master to the overall synthesis quality? ... We evaluate Graph Master s ability to synthesize high-quality graph data by applying it to enhance the data-limited datasets we created and assessing whether the enhanced datasets improve downstream model performance. We employ standard metrics including Accuracy and F1 Score as evaluation criteria, with higher values indicating superior performance. |
| Researcher Affiliation | Academia | Enjun Du1, Xunkai Li1, Tian Jin2, Zhihan Zhang1, Rong-Hua Li1 , Guoren Wang1 1Beijing Institute of Technology 2The Hong Kong University of Science and Technology (Guangzhou) |
| Pseudocode | Yes | Algorithm 1 M-Preserving Graph Sampling |
| Open Source Code | Yes | 2Code is available on https://github.com/Enjun Du/Graph Master. |
| Open Datasets | Yes | Our experiments utilize six widely recognized text-attributed graph datasets: Cora [32], Citeseer [13], Wikics [10], Arxiv2023 [36], and History and Children [49]. It is worth noting that in order to better simulate the data-limited environment to test the effect of data synthesis, we created 6 data-limited datasets, namely Sub Cora, Sub Citeseer, Sub Wikics, Sub History, Sub Arxiv2023, and Sub Children (details are given in Appendix C). ... After the paper is accepted, we will open source the complete data-limited dataset and its creation code... |
| Dataset Splits | Yes | Table 3: Dataset Statistics Dataset # Nodes # Edges # Classes # Louvain communities # Training nodes # Validation nodes # Test nodes Sub Cora 1354 2486 7 99 815 267 272 |
| Hardware Specification | Yes | We ran the entire experiment on eight 80G A100 GPUs... We selected Qw Q-32B [37] as the large language model for these two baselines, and used two A6000 GPUs with 48G memory for the experiments. |
| Software Dependencies | No | In training the GNN model, we first initialized the text attributes with Sentence-BERT [35] to generate the initial features before proceeding with training. |
| Experiment Setup | Yes | For the background knowledge nodes, we set N = 30, and for the newly generated nodes, we configured M% = 15% (The hyperparameter selection analysis are given in Appendix E). In training the GNN model, we first initialized the text attributes with Sentence-BERT [35] to generate the initial features before proceeding with training. To ensure the robustness of our experiments, we repeated each experiment 50 times and reported the mean and standard deviation of the results. ... Appendix E: Knowledge extraction: Sample size N = 30 nodes provides sufficient context without introducing noise; Node generation: Setting M% = 15% of knowledge nodes balances quantity and quality; Community detection: Parameters µ = 0.5 and γ = 0.5 effectively balance semantic and structural factors; Stochastic sampling: β = 2.0 maintains appropriate exploration-exploitation balance; Edge formation: For semantic mode, (θ1, θ2, θ3) = (0.6, 0.3, 0.1); for topological mode, (0.2, 0.5, 0.3); Quality assessment: Initial threshold τ0 = 7.0 with adaptive update rate ζ = 0.1; Convergence criteria: ϵ = 0.05 provides sufficient refinement iterations; Objective weights: Initialize λsem = λstruct = λbal = 0.33 with learning rate η = 0.05. |