Reinforcement Learning for Route Optimization with Robustness Guarantees

Authors: Tobias Jacobs, Francesco Alesiani, Gulcin Ermis

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach by solving min-max versions of standard benchmarks for the Capacitated Vehicle Routing and the Traveling Salesperson Problem, where our agents obtain near-optimal solutions and improve upon the baselines.
Researcher Affiliation Industry Tobias Jacobs , Francesco Alesiani , Gülcin Ermis NEC Laboratories Europe Gmb H, Kurfürsten-Anlage 36, 69115 Heidelberg, Germany tobias.jacobs@neclab.eu , francesco.alesiani@neclab.eu , gulcin.ermis@neclab.eu
Pseudocode Yes Algorithm 1 contains a pseudo-code description of the environment behavior during training, which is determined by the set of training instances or initial states, the state transition function σ, the objective f, and the baseline policy π0.
Open Source Code No The paper mentions using existing open-source frameworks like Dopamine and OpenAI Gym, but does not state that the authors' own implementation code for their methodology is made publicly available.
Open Datasets Yes Our generation method corresponds to one of the generators used in the 8th DIMACS Implementation Challenge [Johnson and Mc Geoch, 2007]. Furthermore, for the CVRP problem we test the trained agents on benchmark instance sets provided by CVRPlib; see [Uchoa et al., 2017] for more information on the instances and their origin.
Dataset Splits No The paper mentions independent training and test sets, but does not explicitly specify a validation set or detailed dataset split percentages/counts for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments, only mentioning 'one hour of CPU runtime' for a baseline comparison.
Software Dependencies No The paper mentions software components like 'Dopamine framework', 'open AI gym', and a 'MIP solver', but does not provide specific version numbers for any of them.
Experiment Setup Yes The structure of QΘ follows the structure2vec architecture [Dai et al., 2016]... For performance reasons, the degree of the input graph is restricted to the 10 nearest neighbors of each node... In one variant we are masking infeasible actions using Equation 5. In the second variant each convolution layer is parameterized by individual weights... All instances of CVRP and TSP have been extended to robust (min-max) problems, where we defined uncertain distances by setting α = 0.3 and β = 2.