Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

Authors: Michal Derezinski, Jonathan Lacotte, Mert Pilanci, Michael W. Mahoney

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our theory on a range of different problems, and we have found that the more precise analysis that our theory provides describes well the convergence behavior for a range of optimization problems. In this section, we present numerical simulations illustrating this for regularized logistic regression and least squares regression, with different datasets ranging from medium to large scale: the CIFAR-10 dataset, the Musk dataset, and WESAD [SRD+18].
Researcher Affiliation Academia Michał Dereziński Department of Statistics University of California, Berkeley mderezin@berkeley.edu Jonathan Lacotte Department of Electrical Engineering Stanford University lacotte@stanford.edu Mert Pilanci Department of Electrical Engineering Stanford University pilanci@stanford.edu Michael W. Mahoney ICSI and Department of Statistics University of California, Berkeley mmahoney@stat.berkeley.edu
Pseudocode No The paper describes mathematical derivations and algorithms but does not present them in a pseudocode or explicitly labeled algorithm block format.
Open Source Code No The paper does not include any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We present numerical simulations illustrating this for regularized logistic regression and least squares regression, with different datasets ranging from medium to large scale: the CIFAR-10 dataset, the Musk dataset, and WESAD [SRD+18].
Dataset Splits Yes For each dataset, we choose the value of λ among {10 j | j = 0, . . . , 8} that minimizes the error on a hold out validation set.
Hardware Specification No The paper states 'see Appendix E for hardware details', but Appendix E is not provided in the submitted text, thus specific hardware models or specifications cannot be determined.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments.
Experiment Setup Yes We use a sketch size m = d/2 for NS. In the bottom plots, we report the CPU and GPU wall-clock times to reach a 10 6 accurate solution for NS with different sketching methods. For each dataset, we choose the value of λ among {10 j | j = 0, . . . , 8} that minimizes the error on a hold out validation set. For CIFAR-10 and Musk, we pick λ = 10 4. For WESAD, we pick λ = 10 5.