LPGD: A General Framework for Backpropagation through Embedded Optimization Layers

Authors: Anselm Paulus, Georg Martius, Vı́t Musil

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically analyze our method and demonstrate on historical and synthetic data that LPGD converges faster than gradient descent even in a differentiable setup. and We compare LPGD to GD in two such cases: a) Learning the rules of Sudoku from synthetic data and b) tuning the parameters of a Markowitz control policy on historical trading data.
Researcher Affiliation Academia 1University of Tübingen, Tübingen, Germany 2Max Planck Institute for Intelligent Systems, Tübingen, Germany 3Masaryk University, Brno, Czech Republic.
Pseudocode Yes Algorithm 1 Forward and Backward Pass of LPGDτ
Open Source Code Yes The code is available at github.com/martius-lab/diffcp-lpgd.
Open Datasets Yes We consider a version of the Sudoku experiment proposed by Amos & Kolter (2017a). and We now consider the Markowitz Portfolio Optimization setting described by Agrawal et al. (2020, 5).
Dataset Splits No For the Sudoku experiment, the paper states 'The dataset consists of 9000 training and 1000 test instances.' It does not explicitly mention a validation set or describe how data was split for validation. For the Markowitz control policy, no specific splits are mentioned.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or specific cloud instance types used for running experiments.
Software Dependencies No The paper mentions using 'CVXPY' and the 'SCS solver' with citations, but it does not specify explicit version numbers for these software dependencies (e.g., 'CVXPY 1.x' or 'SCS solver 2.x') that would allow for precise reproducibility.
Experiment Setup Yes The best hyperparameters for LPGD are τ = 104, ρ = 0.1, α = 0.1, for gradient descent they are ρ = 10 3, α = 0.1. and When not specified otherwise, we use α = 0.001, ϵ = 0.0001, τ = 100 and ρ = 0.