Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RouteFinder: Towards Foundation Models for Vehicle Routing Problems

Authors: Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, Andrรฉ Hottung, Niels Wouda, Leon Lan, Junyoung Park, Kevin Tierney, Jinkyoo Park

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 48 VRP variants show Route Finder outperforms recent state-of-the-art learning methods. Our code is publicly available at https://github.com/ai4co/routefinder. We evaluate Route Finder through extensive experiments on 48 VRP variants, i.e., three times as many as previous works, assessing the impact of each novel component on performance.
Researcher Affiliation Collaboration 1KAIST 2Bielefeld University 3Rotterdam School of Management 4VU Amsterdam 5Omelet AI4CO
Pseudocode No The paper describes methods and processes verbally and mathematically, but does not contain clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code Yes Our code is publicly available at https://github.com/ai4co/routefinder.
Open Datasets Yes Finally, Appendix D.2 reports the results for large-scale CVRPLIB, which demonstrate Route Finder better generalize across sizes and real-world distributions than other multi-task models and single-variant ones. Table 9: Results on large-scale CVRPLIB instances from the X set.
Dataset Splits Yes Data Generation To train and evaluate Route Finder across a diverse set of VRP variants, we employ a unified data generation process detailed in Appendix A.1. ... Each model is trained for 300 epochs on 100,000 VRP instances that are generated on the fly... We evaluate all approaches on 1,000 instances of held-out test data for each size n of each variant.
Hardware Specification Yes Hardware All training runs are conducted on NVIDIA A100 GPUs and take between 9 to 24 hours per model. Evaluation is conducted on an AMD Ryzen Threadripper 3960X 24-core CPU with a single RTX 3090 GPU.
Software Dependencies No The paper mentions baselines like Py VRP (Wouda et al., 2024) and Google OR-Tools (Perron & Furnon, 2023), and uses the Adam optimizer (Kingma & Ba, 2015), but it does not specify version numbers for its own core software components like Python, PyTorch, or CUDA.
Experiment Setup Yes Training Each model is trained for 300 epochs on 100,000 VRP instances that are generated on the fly... We use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 3 10 4 and batch size of 256. At epochs 270 and 295, the learning rate is multiplied by 0.1. Table 5: Experiment hyperparameters. Values with / indicate different choices depending on the model, i.e., on the right are values for the Transformer-Based encoder. Model Embedding dimension 128 Number of attention heads 8 Number of encoder layers 6 Use Pre-norm False / True Normalization Instance / RMSNorm Feedforward hidden dimension 512 Feedforward structure MLP / Gated MLP Feedforward activation Re LU / Swi GLU Tanh clipping 10.0 Mask logits True Training Train decode type multistart sampling Val & Test decode type multistart greedy Augmentation function dihedral Batch size 256 Train data per epoch 100,000 Reward normalization Exponentially smoothed mean Normalization ฮฑ 0.25 Optimization Optimizer Adam Learning rate 3e-4 Weight decay 1e-6 LR scheduler Multi Step LR LR milestones [270, 295] LR gamma 0.1 Gradient clip value 1.0 Max epochs 300