Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update
Authors: Michal Derezinski, Jonathan Lacotte, Mert Pilanci, Michael W. Mahoney
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our theory on a range of different problems, and we have found that the more precise analysis that our theory provides describes well the convergence behavior for a range of optimization problems. In this section, we present numerical simulations illustrating this for regularized logistic regression and least squares regression, with different datasets ranging from medium to large scale: the CIFAR-10 dataset, the Musk dataset, and WESAD [SRD+18]. |
| Researcher Affiliation | Academia | Michał Dereziński Department of Statistics University of California, Berkeley mderezin@berkeley.edu Jonathan Lacotte Department of Electrical Engineering Stanford University lacotte@stanford.edu Mert Pilanci Department of Electrical Engineering Stanford University pilanci@stanford.edu Michael W. Mahoney ICSI and Department of Statistics University of California, Berkeley mmahoney@stat.berkeley.edu |
| Pseudocode | No | The paper describes mathematical derivations and algorithms but does not present them in a pseudocode or explicitly labeled algorithm block format. |
| Open Source Code | No | The paper does not include any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We present numerical simulations illustrating this for regularized logistic regression and least squares regression, with different datasets ranging from medium to large scale: the CIFAR-10 dataset, the Musk dataset, and WESAD [SRD+18]. |
| Dataset Splits | Yes | For each dataset, we choose the value of λ among {10 j | j = 0, . . . , 8} that minimizes the error on a hold out validation set. |
| Hardware Specification | No | The paper states 'see Appendix E for hardware details', but Appendix E is not provided in the submitted text, thus specific hardware models or specifications cannot be determined. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | We use a sketch size m = d/2 for NS. In the bottom plots, we report the CPU and GPU wall-clock times to reach a 10 6 accurate solution for NS with different sketching methods. For each dataset, we choose the value of λ among {10 j | j = 0, . . . , 8} that minimizes the error on a hold out validation set. For CIFAR-10 and Musk, we pick λ = 10 4. For WESAD, we pick λ = 10 5. |