Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hybrid-Balance GFlowNet for Solving Vehicle Routing Problems

Authors: Ni Zhang, Zhiguang Cao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to validate the effectiveness of the Hybrid-Balance GFlow Net (HBG) in enhancing two representative GFlow Net-based solvers, i.e., AGFN and GFACS, on CVRP. We first present comparison results, followed by ablation studies to analyze the contribution of individual components. Lastly, we extend the evaluation to other vehicle routing problem.
Researcher Affiliation	Academia	Ni Zhang, Zhiguang Cao School of Computing and Information Systems, Singapore Management University, Singapore EMAIL, EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes the methodology using natural language, mathematical equations, and figures, but no explicit pseudocode sections.
Open Source Code	Yes	The code is available at https: //github.com/ZHANG-NI/HBG
Open Datasets	Yes	Dataset: We adopt synthetic CVRP datasets following standard settings used in prior work [22, 25, 55, 49]. Each instance features a single depot and multiple customers served by a vehicle with fixed capacity C. The depot and customer coordinates are sampled uniformly from the unit square [0, 1]2, and customer demands follow a uniform distribution U[a, b] with a = 1 and b = 9. The vehicle capacity is fixed at C = 50 across all problem sizes: 100, 200, 500, and 1,000 nodes. For testing, we generate 128 synthetic instances for each of the 200-, 500-, and 1,000-node settings, aligned with evaluation rules established in AGFN and GFACS. Additional experiments on the public benchmark CVRPLib are reported in Appendix B.1.
Dataset Splits	Yes	All models are trained on 100-node instances. For testing, we generate 128 synthetic instances for each of the 200-, 500-, and 1,000-node settings, aligned with evaluation rules established in AGFN and GFACS.
Hardware Specification	Yes	The experiments are conducted on a server equipped with an NVIDIA A100 GPU and an Intel Xeon 6342 CPU.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow). It refers to a GNN module and ReLU activation function, but these are components, not versioned software dependencies.
Experiment Setup	Yes	Hyperparameters: We adopt the same model configurations and training settings as AGFN and GFACS, including network architecture, batch size, learning rate, optimizer, and other hyperparameters. Training is conducted using sampling-based decoding with N = 20 routes per instance. During inference, AGFN uses depot-guided inference, and GFACS applies an ant colony search with depot-guided node selection. All models are trained on 100-node instances. The updated results (Table 7a 7c) will be incorporated into the Appendix of the revised version. HBG-GFACS_5 10 3 refers to the HBG-GFACS model evaluated with a learning rate of 5 10 3; the interpretation is analogous for the other entries. HBG-GFACS_SGD refers to the HBG-GFACS model evaluated using the SGD optimizer; the interpretation is analogous for the other entries.