Accelerating Quadratic Optimization with Reinforcement Learning
Authors: Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Mészáros problems. |
| Researcher Affiliation | Academia | Jeffrey Ichnowski*1, Paras Jain*1, Bartolomeo Stellato2, Goran Banjac3, Michael Luo1, Francesco Borrelli1, Joseph E. Gonzalez1, Ion Stoica1, and Ken Goldberg1 1University of California, Berkeley, 2Princeton University, 3ETH Zürich Correspondence to: {jeffi, paras_jain}@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 TD3 for ρ (scalar) ... Algorithm 2 TD3 for ρ (vector) |
| Open Source Code | Yes | Code, models, and videos are available at https://berkeleyautomation.github.io/rlqp/. |
| Open Datasets | Yes | We train with randomized QPs across various problem classes (Sec. 5) that have solutions guaranteed by construction... We also evaluate on QPLIB [14], Netlib [16], and Maros and Mészáros [32], as they are wellestablished benchmark problems in the optimization community. |
| Dataset Splits | No | The paper mentions train and test sets but does not explicitly provide details about a separate validation set or its split. |
| Hardware Specification | Yes | We trained on a system with 256 Gi B RAM, two Intel Xeon E5-2650 v4 CPUs @ 2.20 GHz for a total of 24 cores (48 hyperthreads), and five NVIDIA Tesla V100s. We ran benchmarks on a system with Intel i9 8-core CPU @ 2.4 GHz and without GPU acceleration. |
| Software Dependencies | No | Training is performed in Py Torch with a Python wrapper around the modified OSQP which is written C/C++. The paper mentions PyTorch and OSQP but does not specify their version numbers. |
| Experiment Setup | Yes | In all experiments, the policy network architecture has 3 fully-connected hidden layers of 48 with Re LU activations between the input and output layers. The input layer is normalized, and the output activation is Tanh. The critic network architectures use the identity function as the output activation, but otherwise matches the policy. |