The Curse of Unrolling: Rate of Differentiating Through Optimization

Authors: Damien Scieur, Gauthier Gidel, Quentin Bertrand, Fabian Pedregosa

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5.1 Experiments on least squares objective. We compare multiple algorithms for estimating the Jacobian (OPT) of the solution of a ridge regression problem (Example (1)) for a fixed value of θ = 10−3. Figure 1 shows the objective and Jacobian suboptimality on a ridge regression problem with the breast-cancer2 as underlying dataset. Figure 4 shows the Jacobian suboptimality as a function of the number of iterations, on both the breast-cancer and bodyfat3 dataset, and for a synthetic dataset (where H(θ) is generated as A A, where each entry in A is generated from a standard Gaussian distribution). Appendix B contains further details and experiments on a logistic regression objective.
Researcher Affiliation Collaboration Damien Scieur Samsung SAIL Montreal damien.scieur@gmail.com Quentin Bertrand Mila & Universtié de Montréal quentin.bertrand@mila.quebec Gauthier Gidel Mila & Université de Montréal Canada CIFAR AI Chair gidelgau@mila.quebec Fabian Pedregosa Google Research pedregosa@google.com
Pseudocode No The paper describes algorithms using mathematical equations and text, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets Yes Figure 1 shows the objective and Jacobian suboptimality on a ridge regression problem with the breast-cancer dataset... Figures 4...on both the breast-cancer and bodyfat dataset...2https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) 3http://lib.stat.cmu.edu/datasets/
Dataset Splits No The paper uses datasets but does not explicitly provide specific train/validation/test split percentages, sample counts, or a detailed splitting methodology.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models, or cloud computing instance types with their specifications.
Software Dependencies No The paper does not mention specific software names with version numbers (e.g., Python 3.8, PyTorch 1.9, TensorFlow 2.x) that would be needed to replicate the experiments.
Experiment Setup Yes We compare multiple algorithms for estimating the Jacobian (OPT) of the solution of a ridge regression problem (Example (1)) for a fixed value of θ = 10−3. The non-asymptotic algorithm is rather complicated to implement; see Appendix D. Moreover, it requires a bound on the spectrum of H(θ), namely [ℓ, L], and one also has to choose an associated expected spectral density µ(λ) (parametrized by α) and the parameter η. ... The featured two-phase curve was computed using the step-size with a fastest asymptotic rate, computed through a grid-search on the step-size values.