Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster
Authors: Sharan Vaswani, Reza Babanezhad Harikandeh
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We strengthen these results and show that if the objective function satisfies a certain nonuniform smoothness condition, GD-LS can result in a faster convergence rate than GD(1/L). In particular, we prove that for convex objectives corresponding to logistic regression and multi-class classification, GD-LS can converge to the optimum at a linear rate, and hence improves over the sublinear convergence of GD(1/L). Furthermore, for non-convex objectives satisfying gradient domination (e.g. those corresponding to the softmax policy gradient in RL or generalized linear models with a logistic link function), GD-LS can match the fast convergence of algorithms tailored for these specific settings. |
| Researcher Affiliation | Collaboration | 1Simon Fraser University 2Samsung AI, Montreal. Correspondence to: Sharan Vaswani <EMAIL>, Reza Babanezhad <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 GD with Armijo Line-search (GD-LS) 1: Input: θ0, ηmax, c (0, 1), β (0, 1) 2: for t = 0, . . . , T 1 do 3: ηt ηmax 4: while f(θt ηt f(θt)) > f(θt) c ηt f(θt) 2 2 do 5: ηt ηt β 6: end while 7: ηt ηt 8: θt+1 = θt ηt f(θt) 9: end for |
| Open Source Code | No | The paper does not contain any explicit statement about providing source code or a link to a code repository. |
| Open Datasets | Yes | Comparing GD-LS with c = 1/2 and ηmax = 108 and GD(1/L) for unregularized logistic regression on the ijcnn dataset (Chang & Lin, 2011). |
| Dataset Splits | No | The paper mentions using the 'ijcnn dataset' and a 'synthetic separable dataset' but does not provide specific details on how these datasets were split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, used to replicate the experiments. |
| Experiment Setup | Yes | Figure 1. Comparing GD-LS with c = 1/2 and ηmax = 108 and GD(1/L) for unregularized logistic regression on the ijcnn dataset (Chang & Lin, 2011). Figure 2. Comparing GD-LS with c = 1/2, ηmax = 108 and GD(1/L) for unregularized logistic regression on a synthetic separable dataset with γ = 0.1, n = 104 and d = 200. Figure 3. Comparing GD-LS with c = 1/2, ηmax = 104 and GD(1/L) for GLM on a synthetic dataset with n = 104, d = 200, θ = 1. |