Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning

Authors: Xiaojun Guo, Ang Li, Yifei Wang, Stefanie Jegelka, Yisen Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce G1, a simple yet effective approach demonstrating that Reinforcement Learning (RL) on synthetic graph-theoretic tasks can significantly scale LLMs graph reasoning abilities. To enable RL training, we curate Erdős, the largest graph reasoning dataset to date, comprising 50 diverse graph-theoretic tasks of varying difficulty levels, 100k training data and 5k test data, all drived from real-world graphs. With RL on Erdős, G1 obtains substantial improvements in graph reasoning, where our finetuned 3B model even outperforms Qwen2.5-72B-Instruct (24x size).
Researcher Affiliation Academia Xiaojun Guo1 Ang Li1 Yifei Wang3 Stefanie Jegelka4,3 Yisen Wang1,2 1State Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2Institute for Artificial Intelligence, Peking University 3MIT 4TUM
Pseudocode No The paper describes the GRPO algorithm and concepts like BFS and Dijkstra's algorithm but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes Our implementation is open-sourced at https://github.com/PKU-ML/G1, with models and datasets hosted on Hugging Face collections PKU-ML/G1 for broader accessibility.
Open Datasets Yes Our implementation is open-sourced at https://github.com/PKU-ML/G1, with models and datasets hosted on Hugging Face collections PKU-ML/G1 for broader accessibility.
Dataset Splits Yes For the training split, there are a total of 100,000 question-answer pairs, evenly distributed across tasks with 2,000 examples each. We also reserve 5,000 test pairs with different questions for evaluation.
Hardware Specification Yes We finetune our model from Qwen2.5-Instruct models (3B and 7B) for 300 steps with batch size 512 on a cluster of 8 A800 GPUs
Software Dependencies No The paper mentions using Qwen2.5-Instruct models, Network X, and vLLM engine, but does not provide specific version numbers for these software components or other key libraries and programming languages to ensure reproducibility.
Experiment Setup Yes We finetune our model from Qwen2.5-Instruct models (3B and 7B) for 300 steps with batch size 512 on a cluster of 8 A800 GPUs... For GRPO training, we set ϵ to be 0.02, β to be 0.001, group size G to be 5, and context length to be 4096 unless otherwise specified. We additionally incorporate an entropy loss of weight 0.001 to encourage the policy to explore. Lastly, we train the models on 8x A800 GPUs with batch size of 512.