Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining
Authors: Chunyu Wei, Wenji Hu, Xingjia Hao, Xin Wang, Yifan Yang, Yunhai Wang, Yang Tian, Yueguo Chen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show Graph Chain significantly outperforms prior methods, enabling scalable and adaptive LLM-driven graph analysis. Extensive experimentation demonstrating that Graph Chain significantly outperforms existing methods by an average of 20.7%, with exceptional scalability handling graphs up to 200,000 nodes while maintaining consistent performance. |
| Researcher Affiliation | Academia | 1Renmin University of China, China 2Guangxi University, China 3Beijing Jiaotong University, China EMAIL, EMAIL, EMAIL EMAIL, EMAIL EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using prose and mathematical formulations (e.g., in Section 4 Methodology and its subsections) and a case study in Figure 6 that illustrates a sequence of actions, but it does not include a dedicated, structured pseudocode or algorithm block. |
| Open Source Code | Yes | 2The code is available in https://github.com/wuanjunruc/Graph Chain |
| Open Datasets | Yes | We evaluate Graph Chain on five diverse graph datasets representing different real-world domains, as illustrated in Table 1. ... Citation Graphs Cora [Yang et al., 2016] Cite Seer Pub Med ... Social Networks Facebook [Leskovec and Mcauley, 2012] Twitter ... Chemical Molecules QM9 [Wu et al., 2018] ... Traffic Networks METR-LA [Chen et al., 2020] ... Financial Networks Elliptic [Weber et al., 2019] |
| Dataset Splits | Yes | We allocated 500 pairs per scenario for training and 100 for testing, with domain experts crafting exemplary instruction templates to ensure ecological validity. |
| Hardware Specification | Yes | All experiments were conducted on 2 NVIDIA A800 80GB GPUs, using Lo RA-based fine-tuning (rank r=16, alpha=32) on the Qwen2.5-7B-instruction model. |
| Software Dependencies | No | The paper mentions specific models like Qwen2.5-7B-instruction, Llama2-13B, and Llama3-8B, but does not provide specific version numbers for general software dependencies like Python, PyTorch, or other deep learning frameworks used in the implementation. |
| Experiment Setup | Yes | Supervised Fine-Tuning (SFT) Stage: We used a learning rate of 5e-5 with 4% warmup and a cosine scheduler for 8 epochs... Reinforcement Learning (RL) Stage: ...Learning rate: 1e-5, Batch size: 8, Initial KL coefficient: 0.3, Loss coefficient (β): 0.15, GAE parameter (λ): 0.95, Discount factor (γ): 0.99. Test-Time Adaptation Stage: ...Learning rate: 0.01, Batch size: 10. |