LPGD: A General Framework for Backpropagation through Embedded Optimization Layers
Authors: Anselm Paulus, Georg Martius, Vı́t Musil
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically analyze our method and demonstrate on historical and synthetic data that LPGD converges faster than gradient descent even in a differentiable setup. and We compare LPGD to GD in two such cases: a) Learning the rules of Sudoku from synthetic data and b) tuning the parameters of a Markowitz control policy on historical trading data. |
| Researcher Affiliation | Academia | 1University of Tübingen, Tübingen, Germany 2Max Planck Institute for Intelligent Systems, Tübingen, Germany 3Masaryk University, Brno, Czech Republic. |
| Pseudocode | Yes | Algorithm 1 Forward and Backward Pass of LPGDτ |
| Open Source Code | Yes | The code is available at github.com/martius-lab/diffcp-lpgd. |
| Open Datasets | Yes | We consider a version of the Sudoku experiment proposed by Amos & Kolter (2017a). and We now consider the Markowitz Portfolio Optimization setting described by Agrawal et al. (2020, 5). |
| Dataset Splits | No | For the Sudoku experiment, the paper states 'The dataset consists of 9000 training and 1000 test instances.' It does not explicitly mention a validation set or describe how data was split for validation. For the Markowitz control policy, no specific splits are mentioned. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or specific cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions using 'CVXPY' and the 'SCS solver' with citations, but it does not specify explicit version numbers for these software dependencies (e.g., 'CVXPY 1.x' or 'SCS solver 2.x') that would allow for precise reproducibility. |
| Experiment Setup | Yes | The best hyperparameters for LPGD are τ = 104, ρ = 0.1, α = 0.1, for gradient descent they are ρ = 10 3, α = 0.1. and When not specified otherwise, we use α = 0.001, ϵ = 0.0001, τ = 100 and ρ = 0. |