Accelerating Quadratic Optimization with Reinforcement Learning

Authors: Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Mészáros problems.
Researcher Affiliation Academia Jeffrey Ichnowski*1, Paras Jain*1, Bartolomeo Stellato2, Goran Banjac3, Michael Luo1, Francesco Borrelli1, Joseph E. Gonzalez1, Ion Stoica1, and Ken Goldberg1 1University of California, Berkeley, 2Princeton University, 3ETH Zürich Correspondence to: {jeffi, paras_jain}@berkeley.edu
Pseudocode Yes Algorithm 1 TD3 for ρ (scalar) ... Algorithm 2 TD3 for ρ (vector)
Open Source Code Yes Code, models, and videos are available at https://berkeleyautomation.github.io/rlqp/.
Open Datasets Yes We train with randomized QPs across various problem classes (Sec. 5) that have solutions guaranteed by construction... We also evaluate on QPLIB [14], Netlib [16], and Maros and Mészáros [32], as they are wellestablished benchmark problems in the optimization community.
Dataset Splits No The paper mentions train and test sets but does not explicitly provide details about a separate validation set or its split.
Hardware Specification Yes We trained on a system with 256 Gi B RAM, two Intel Xeon E5-2650 v4 CPUs @ 2.20 GHz for a total of 24 cores (48 hyperthreads), and five NVIDIA Tesla V100s. We ran benchmarks on a system with Intel i9 8-core CPU @ 2.4 GHz and without GPU acceleration.
Software Dependencies No Training is performed in Py Torch with a Python wrapper around the modified OSQP which is written C/C++. The paper mentions PyTorch and OSQP but does not specify their version numbers.
Experiment Setup Yes In all experiments, the policy network architecture has 3 fully-connected hidden layers of 48 with Re LU activations between the input and output layers. The input layer is normalized, and the output activation is Tanh. The critic network architectures use the identity function as the output activation, but otherwise matches the policy.