Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood
Authors: Qiujiang Jin, Alec Koppel, Ketan Rajawat, Aryan Mokhtari
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on various datasets also confirm our theoretical findings. We conduct our experiments on eight datasets. |
| Researcher Affiliation | Collaboration | 1Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA. 2Amazon, Bellevue, WA, USA. 3Department of Electrical Engineering, Indian Institute of Technology Kanpur, Kanpur, UP, INDIA. |
| Pseudocode | Yes | Algorithm 1 Sharpened-BFGS applied to (11). Algorithm 2 General Sharpened-BFGS. Algorithm 3 The randomized Sharpened-BFGS method. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper names eight datasets (svmguide3, phishing, mushrooms, a9a, connect-4, w8a, protein, colon-cancer, gisette) in Table 2 and states they are used for experiments. However, it does not provide explicit access information such as URLs, DOIs, specific repository names, or formal citations with author names and year for these datasets. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) used in the experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments. |
| Experiment Setup | Yes | We focus on the following logistic regression problem with l2 regularization min x Rd f(x) = 1/N sum_{i=1 to N} ln (1 + e^{y_i z_i^T x}) + mu/2 ||x||^2... The regularization parameter ยต is chosen from the set A = {10-5, 10-4, 10-3, 10-2, 10-1, 1, 10} to achieve the best performance. We initialize all the algorithms with the same initial point x0 = (1/d3/2) 1, where 1 Rd is the one vector. We set the initial Hessian approximation matrix as LI and set the stepsize to 1 for all QN methods. The step size of gradient descent is set as 1/L to achieve its linear convergence rate on each dataset. |