Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

RouteFinder: Towards Foundation Models for Vehicle Routing Problems

Authors: Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, André Hottung, Niels Wouda, Leon Lan, Junyoung Park, Kevin Tierney, Jinkyoo Park

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 48 VRP variants show Route Finder outperforms recent state-of-the-art learning methods. Our code is publicly available at https://github.com/ai4co/routefinder. We evaluate Route Finder through extensive experiments on 48 VRP variants, i.e., three times as many as previous works, assessing the impact of each novel component on performance.
Researcher Affiliation	Collaboration	1KAIST 2Bielefeld University 3Rotterdam School of Management 4VU Amsterdam 5Omelet AI4CO
Pseudocode	No	The paper describes methods and processes verbally and mathematically, but does not contain clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	Our code is publicly available at https://github.com/ai4co/routefinder.
Open Datasets	Yes	Finally, Appendix D.2 reports the results for large-scale CVRPLIB, which demonstrate Route Finder better generalize across sizes and real-world distributions than other multi-task models and single-variant ones. Table 9: Results on large-scale CVRPLIB instances from the X set.
Dataset Splits	Yes	Data Generation To train and evaluate Route Finder across a diverse set of VRP variants, we employ a unified data generation process detailed in Appendix A.1. ... Each model is trained for 300 epochs on 100,000 VRP instances that are generated on the fly... We evaluate all approaches on 1,000 instances of held-out test data for each size n of each variant.
Hardware Specification	Yes	Hardware All training runs are conducted on NVIDIA A100 GPUs and take between 9 to 24 hours per model. Evaluation is conducted on an AMD Ryzen Threadripper 3960X 24-core CPU with a single RTX 3090 GPU.
Software Dependencies	No	The paper mentions baselines like Py VRP (Wouda et al., 2024) and Google OR-Tools (Perron & Furnon, 2023), and uses the Adam optimizer (Kingma & Ba, 2015), but it does not specify version numbers for its own core software components like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Training Each model is trained for 300 epochs on 100,000 VRP instances that are generated on the fly... We use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 3 10 4 and batch size of 256. At epochs 270 and 295, the learning rate is multiplied by 0.1. Table 5: Experiment hyperparameters. Values with / indicate different choices depending on the model, i.e., on the right are values for the Transformer-Based encoder. Model Embedding dimension 128 Number of attention heads 8 Number of encoder layers 6 Use Pre-norm False / True Normalization Instance / RMSNorm Feedforward hidden dimension 512 Feedforward structure MLP / Gated MLP Feedforward activation Re LU / Swi GLU Tanh clipping 10.0 Mask logits True Training Train decode type multistart sampling Val & Test decode type multistart greedy Augmentation function dihedral Batch size 256 Train data per epoch 100,000 Reward normalization Exponentially smoothed mean Normalization α 0.25 Optimization Optimizer Adam Learning rate 3e-4 Weight decay 1e-6 LR scheduler Multi Step LR LR milestones [270, 295] LR gamma 0.1 Gradient clip value 1.0 Max epochs 300