Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation

Authors: Hanchen Su, Xuyuan Li, Yan Zhou, zhuoyi lu, Ziwei Chai, Haozheng Wang, Chen Zhang, YANG YANG

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train Cypher-RI from scratch using Qwen2.5-Coder-7B and perform comprehensive evaluations on multiple Text-to-Cypher benchmarks. Notably, we validate the generalization capability of our framework by evaluating the trained models on graph databases distinct from those used during training. On Cypher Bench [8], our models achieve substantial absolute improvements ranging from 13.12% to 69.59% over the base model, and outperform the strong GPT-4o baseline by up to 9.41%. Extensive empirical evaluations validate the effectiveness of our method consistently achieving substantial improvements over baseline approaches, and even outperforming the strong GPT-4o model while being much more cost effective.
Researcher Affiliation	Collaboration	Hanchen Su Zhejiang University EMAIL Xuyuan Li Zhejiang University EMAIL Yan Zhou Createlink Technology EMAIL Zhuoyi Lu Zhejiang University EMAIL Ziwei Chai Zhejiang University EMAIL Haozheng Wang Independent Researcher EMAIL Chen Zhang Createlink Technology EMAIL Yang Yang Zhejiang University EMAIL
Pseudocode	No	The paper describes the training template in Table 1 and the GRPO algorithm steps in section 2.1, but it does not present them in a structured pseudocode or algorithm block.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We use publicly available datasets and the code of our work is fully provided.
Open Datasets	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We use publicly available datasets and the code of our work is fully provided. We use two datasets to evaluate the conversion of natural language into Cypher queries: Cypher Bench, Neo4j-Text2Cypher.
Dataset Splits	Yes	The training dataset consists of four domains: biology, soccer, art, and terrorist attack, comprising a total of 8,295 examples. This diverse collection enables the model to learn from a broad spectrum of semantic structures and query intents. The Cypher Bench test set, designed to evaluate zero-shot generalization, includes seven distinct domains such as geography, flight accident, politics, company, fictional character, movie, and NBA. The Neo4j-Text2Cypher dataset, which covers 15 diverse domains... With a total of 2,380 examples.
Hardware Specification	Yes	Our training is conduct on 4 Nvidia A800 GPUs, with full parameter optimization and gradient checkpointing.
Software Dependencies	No	The paper mentions "Qwen2.5-Coder-7B" as the base model but does not specify software dependencies like operating system, programming languages, libraries, or frameworks with their version numbers.
Experiment Setup	Yes	Table 6: Implementation details of Cypher-RI. Parameter Value Base Model Qwen2.5-Coder-7B Train Batch Size 1024 Micro Train Batch Size 8 Rollout Batch Size 128 Micro Rollout Batch Size 16 Learning Rate 1e-6 Prompt Max Length 1,024 Generation Max Length 2,000 Initial KL Coefficient 0 Mixed Precision BF16 Rollout Temperature 1.0 Optimizer Adam W Clip Ratio 0.2 Number of Rollout 8