Reinforcement Learning for Solving the Vehicle Routing Problem

Authors: MohammadReza Nazari, Afshin Oroojlooy, Lawrence Snyder, Martin Takac

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments indicate that our framework performs significantly better than well-known classical heuristics designed for the VRP, and that it is robust in the sense that its worst results are still relatively close to optimal. Comparing our method with the OR-Tools VRP engine [16], which is one of the best open-source VRP solvers, we observe a noticeable improvement; in VRP instances with 50 and 100 customers, our method provides shorter tours in roughly 61% of the instances.
Researcher Affiliation Academia Mohammadreza Nazari Afshin Oroojlooy Martin Takáˇc Lawrence V. Snyder Department of Industrial and Systems Engineering Lehigh University, Bethlehem, PA 18015 {mon314,afo214,takac,lvs2}@lehigh.edu
Pseudocode No The paper describes its model and training process in detail but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper refers to Google's OR-Tools (which is open-source) as a baseline, but there is no explicit statement or link indicating that the authors' own code for their reinforcement learning framework is open-source or publicly available.
Open Datasets No The paper states that customer and depot locations, as well as demand values, were "randomly generated from a fixed distribution" for the experiments, indicating a synthetic dataset, and does not provide concrete access information for a publicly available or open dataset.
Dataset Splits No The paper mentions running tests on 1000 instances for various problem sizes (VRP10, VRP20, VRP50, VRP100) and discusses training, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or exact counts) needed for reproduction.
Hardware Specification Yes For example, one-by-one decoding of VRP10 for 1000 instances takes around 50 seconds, but by passing all 1000 instances to decoder at once, the total decoding time decreases to around 1 second on a K80 GPU.
Software Dependencies No The paper mentions that Google's OR-Tools is implemented in C++ and is used for comparison, but it does not specify version numbers for OR-Tools or any other software dependencies, libraries, or frameworks used in their own implementation.
Experiment Setup Yes In this experiment, we have employed two different decoders: (i) greedy, in which at every decoding step, the node (either customer or depot) with the highest probability is selected as the next destination, and (ii) beam search (BS), which keeps track of the most probable paths and then chooses the one with the minimum tour length [28]. For faster training and generating feasible solutions, we have used a masking scheme which sets the log-probabilities of infeasible solutions to or forces a solution if a particular condition is satisfied. In the VRP, we use the following masking procedures: (i) nodes with zero demand are not allowed to be visited; (ii) all customer nodes will be masked if the vehicle s remaining load is exactly 0; and (iii) the customers whose demands are greater than the current vehicle load are masked.