Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
Authors: Arthur Delarue, Ross Anderson, Christian Tjandraatmadja
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach against several baselines on random and standard library instances, achieving an average gap against the OR-Tools routing solver of 1.7% on moderately-sized problems. |
| Researcher Affiliation | Collaboration | Arthur Delarue MIT Operations Research Center Cambridge, MA adelarue@mit.edu Ross Anderson Google Research Cambridge, MA rander@google.com Christian Tjandraatmadja Google Research Cambridge, MA ctjandra@google.com |
| Pseudocode | No | The paper describes its algorithms and problem formulations using text and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions that its implementation is in C++ and uses SCIP and Gurobi, but it does not provide any statement or link indicating that its own source code for the described methodology is open-source or publicly available. |
| Open Datasets | Yes | In Figure 1, we evaluate our performance on standard library instances from CVRPLIB [27, 38], comparing our results with those of OR-Tools. |
| Dataset Splits | No | The paper mentions using a portion of the data for 'out-of-sample MSE' to limit overfitting during value function training, but it does not specify explicit and reproducible train/validation/test splits for the main experimental evaluation or dataset usage, instead relying on random instance generation or standard library instances without detailing their partitioning for the study's scope. |
| Hardware Specification | No | The paper mentions that experiments were executed 'in parallel on a large cluster of machines' but provides no specific details regarding CPU models, GPU models, memory, or other hardware specifications. |
| Software Dependencies | Yes | Our implementation is in C++ and we use SCIP 6.0.2 as a MIP solver. Average time in seconds to solve a single action selection problem, with the MIP solvers Gurobi 8.1 and SCIP 6.0.2, averaged over 50 random instances. |
| Experiment Setup | Yes | We model ˆV as a small neural network with a fully-connected hidden layer and rectified linear unit (Re LU) activations. We train this neural network to minimize the mean-squared error (MSE) on the cumulative cost data gathered at the policy evaluation step. To limit overfitting, we randomly remove some data (between 10% and 20%) from the training set and use it to evaluate the out-of-sample MSE. Calling (s(i,k0), c(i,k0)) the i-th data point (out of Nk0) from iteration k0, the training objective in iteration k becomes Pk i=1 γk k0( ˆV (s(i,k0)) c(i,k0))2, where γ 2 (0, 1] denotes the retention factor. In Figure 3, we consider a neural network with 16 hidden Re LU nodes and show the effect of the batch SGD learning rate and the LASSO regularization parameter λ... |