Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP

Authors: Yuxin Pan, Zhiguang Cao, Chengyang GU, Liu Liu, Peilin Zhao, Yize Chen, Fangzhen Lin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted across VRP variants showcase the superiority of Mo SES over prior methods. The source code is available at https://github.com/panyxy/moses_vrp. Section 5: Experiments. In this Section, we empirically validate the superiority of Mo SES through evaluations on 16 VRP variants with five constraints, supplemented by hyperparameter and ablation studies. All experiments are conducted on NVIDIA Tesla V100-32GB GPUs on and Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz.
Researcher Affiliation	Collaboration	1The Hong Kong University of Science and Technology 2Singapore Management University 3The Hong Kong University of Science and Technology (Guangzhou) 4Tencent AI Lab 5University of Alberta 6Shanghai Jiao Tong University EMAIL
Pseudocode	No	The paper describes the methodology in prose (Section 4 'Methodology') and provides diagrams (Figure 1), but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks are present.
Open Source Code	Yes	The source code is available at https://github.com/panyxy/moses_vrp.
Open Datasets	Yes	We report the performance comparison of our proposed methods, Mo SES(RF) and Mo SES(Ca DA), against baseline approaches on CVRPLIB instances from the X set, which includes problem sizes ranging from 101 to at most 1,001 nodes, as done in [83, 3].
Dataset Splits	Yes	We report average costs and optimality gaps over 1K test instances... All phases share the training hyperparameters. Each model undergoes 300 training epochs, each containing 100,000 VRP instances generated on the fly. In CVRP, each neural solver is trained on problem instances with a fixed vehicle capacity of Q = 50. To investigate OOD generalization to unseen vehicle capacities, we generate a separate testing dataset for each capacity value in the set {30, 50, 70, 90, 110, 130, 150, 200}.
Hardware Specification	Yes	All experiments are conducted on NVIDIA Tesla V100-32GB GPUs on and Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz.
Software Dependencies	No	The paper mentions 'Adam Optimizer' and 'REINFORCE [69] algorithm' but does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA that were used for their own implementation. It mentions third-party solvers like 'Py VRP [71]' and 'Google OR-Tools [57]' as baselines, but without version numbers for these either.
Experiment Setup	Yes	We consider two problem scales N = {50, 100}, and adopt the same training settings with prior works [3, 38] for baseline models. We optimize Mo SES using REINFORCE [69]... All phases share the training hyperparameters. Each model undergoes 300 training epochs, each containing 100,000 VRP instances generated on the fly. Adam Optimizer is used with a learning rate of 3 10 4, weight decay of 1 10 6, and batch size of 256. We decay the learning rate by a factor of 10 at epochs 270 and 295. During evaluation, each neural solver employs greedy multi-start rollouts with 8 augmentations, selecting the best one from the generated solutions per instance.