reproducibilityindex.ai

Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood

Authors: Qiujiang Jin, Alec Koppel, Ketan Rajawat, Aryan Mokhtari

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments on various datasets also confirm our theoretical findings. We conduct our experiments on eight datasets.
Researcher Affiliation	Collaboration	1Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA. 2Amazon, Bellevue, WA, USA. 3Department of Electrical Engineering, Indian Institute of Technology Kanpur, Kanpur, UP, INDIA.
Pseudocode	Yes	Algorithm 1 Sharpened-BFGS applied to (11). Algorithm 2 General Sharpened-BFGS. Algorithm 3 The randomized Sharpened-BFGS method.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper names eight datasets (svmguide3, phishing, mushrooms, a9a, connect-4, w8a, protein, colon-cancer, gisette) in Table 2 and states they are used for experiments. However, it does not provide explicit access information such as URLs, DOIs, specific repository names, or formal citations with author names and year for these datasets.
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) used in the experiments.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup	Yes	We focus on the following logistic regression problem with l2 regularization min x Rd f(x) = 1/N sum_{i=1 to N} ln (1 + e^{y_i z_i^T x}) + mu/2 \|\|x\|\|^2... The regularization parameter µ is chosen from the set A = {10-5, 10-4, 10-3, 10-2, 10-1, 1, 10} to achieve the best performance. We initialize all the algorithms with the same initial point x0 = (1/d3/2) 1, where 1 Rd is the one vector. We set the initial Hessian approximation matrix as LI and set the stepsize to 1 for all QN methods. The step size of gradient descent is set as 1/L to achieve its linear convergence rate on each dataset.