Reinforcement Learning for Solving the Vehicle Routing Problem
Authors: MohammadReza Nazari, Afshin Oroojlooy, Lawrence Snyder, Martin Takac
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments indicate that our framework performs significantly better than well-known classical heuristics designed for the VRP, and that it is robust in the sense that its worst results are still relatively close to optimal. Comparing our method with the OR-Tools VRP engine [16], which is one of the best open-source VRP solvers, we observe a noticeable improvement; in VRP instances with 50 and 100 customers, our method provides shorter tours in roughly 61% of the instances. |
| Researcher Affiliation | Academia | Mohammadreza Nazari Afshin Oroojlooy Martin Takáˇc Lawrence V. Snyder Department of Industrial and Systems Engineering Lehigh University, Bethlehem, PA 18015 {mon314,afo214,takac,lvs2}@lehigh.edu |
| Pseudocode | No | The paper describes its model and training process in detail but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper refers to Google's OR-Tools (which is open-source) as a baseline, but there is no explicit statement or link indicating that the authors' own code for their reinforcement learning framework is open-source or publicly available. |
| Open Datasets | No | The paper states that customer and depot locations, as well as demand values, were "randomly generated from a fixed distribution" for the experiments, indicating a synthetic dataset, and does not provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions running tests on 1000 instances for various problem sizes (VRP10, VRP20, VRP50, VRP100) and discusses training, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or exact counts) needed for reproduction. |
| Hardware Specification | Yes | For example, one-by-one decoding of VRP10 for 1000 instances takes around 50 seconds, but by passing all 1000 instances to decoder at once, the total decoding time decreases to around 1 second on a K80 GPU. |
| Software Dependencies | No | The paper mentions that Google's OR-Tools is implemented in C++ and is used for comparison, but it does not specify version numbers for OR-Tools or any other software dependencies, libraries, or frameworks used in their own implementation. |
| Experiment Setup | Yes | In this experiment, we have employed two different decoders: (i) greedy, in which at every decoding step, the node (either customer or depot) with the highest probability is selected as the next destination, and (ii) beam search (BS), which keeps track of the most probable paths and then chooses the one with the minimum tour length [28]. For faster training and generating feasible solutions, we have used a masking scheme which sets the log-probabilities of infeasible solutions to or forces a solution if a particular condition is satisfied. In the VRP, we use the following masking procedures: (i) nodes with zero demand are not allowed to be visited; (ii) all customer nodes will be masked if the vehicle s remaining load is exactly 0; and (iii) the customers whose demands are greater than the current vehicle load are masked. |